├── assets
├── LICENSE
└── README.md
/assets:
--------------------------------------------------------------------------------
1 |
2 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | MIT License
2 |
3 | Copyright (c) 2025 supernalintelligence
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Awesome GUI Compute Agents
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 | A curated list of GUI (Graphical User Interface) compute agents - AI systems that can see, understand, and interact with graphical interfaces like humans do.
21 |
22 |
23 | This project is maintained by [Parni](https://x.com/ParnianBrk) and [Ian](https://x.com/ian_derrington). Follow [Supernal Intelligence](https://x.com/supernalasi) for more updates.
24 |
25 | **Website**: [supernalintelligence.com](https://www.supernalintelligence.com/)
26 | **Join our Discord**: [Supernal Intelligence Discord](https://discord.gg/J9pU82wP)
27 |
28 | For more complete data and the latest information, please visit our website: [supernalintelligence.com](https://www.supernalintelligence.com/)
29 |
30 |
31 |
32 | ## What are GUI Compute Agents?
33 |
34 | GUI compute agents are AI systems designed to interact with graphical user interfaces just like humans do. They can:
35 | - See and understand screen elements
36 | - Click buttons, type text, and drag elements
37 | - Navigate through applications and websites
38 | - Complete complex visual workflows
39 | - Automate GUI-based tasks through natural language instructions
40 |
41 | ## Contents
42 |
43 | - [Commercial Agents](#commercial-agents)
44 | - [Open Source Agents](#open-source-agents)
45 | - [Research Projects](#research-projects)
46 | - [By Environment](#by-environment)
47 | - [Browser-Based Agents](#browser-based-agents)
48 | - [Desktop Agents](#desktop-agents)
49 | - [Physical World Agents](#physical-world-agents)
50 | - [Cloud Agents](#cloud-agents)
51 | - [Multi-Device Agents](#multi-device-agents)
52 | - [By Task Complexity](#by-task-complexity)
53 | - [Resources](#resources)
54 |
55 | ## Commercial Agents
56 |
57 | | Name | Developer | Status | Key Features | Environment |
58 | |------|-----------|--------|-------------|-------------|
59 | | [Ace](https://generalagents.com/ace/) | General Agents | Upcoming (2025) | Achieved 20× human speed on UI tasks; controls full computer via screen pixels | Desktop, Browser |
60 | | [ACT-1](https://www.adept.ai/blog/act-1) | Adept AI | Released (2022) | Pioneer in digital actions; self-correcting behavior | Desktop, Browser |
61 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Cloud-based GUI automation; enterprise-grade | Cloud, Browser |
62 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Vision-based GUI automation; supports both browser and desktop interactions | Desktop, Browser |
63 | | [Adaptive.AI](https://adaptive.ai/) | Adaptive AI Inc | Released | AI risk management framework; technology strategy consulting | Browser |
64 | | [AgentGPT](https://github.com/reworkd/AgentGPT) | Reworkd | Released (2023) | User-friendly interface for creating goal-oriented agents | Browser |
65 | | [AI Agent Studio](https://www.automationanywhere.com/products/process-discovery) | Automation Anywhere | Released (2025) | Handles structured and unstructured data; creates AI agents for enterprise automation | Browser |
66 | | [Apple Intelligence Agents](https://www.apple.com/apple-intelligence/) | Apple | Upcoming (2025) | Deep OS integration; privacy-focused | Phone, Desktop |
67 | | [AskUI Vision Agent](https://www.askui.com/) | AskUI | Released | Cross-platform functionality without virtual machines | Desktop, Browser, Phone |
68 | | [Beam AI](https://www.beam.cloud/) | Beam | Released | Agentic Process Automation platform for customer support, onboarding, sales proposal generation | Browser |
69 | | [Claude Agent Kit](https://www.anthropic.com/solutions/agents) | Anthropic | Upcoming (2024) | Official toolkit for building Claude-powered agents | Browser |
70 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released (2024) | Works on desktop apps and browsers; AI model-based approach | Browser, Desktop, Multi-device |
71 | | [Devin](https://cognition.dev) | Cognition Labs | Upcoming (2025) | Full-stack programming capabilities with browser access | Desktop, Browser |
72 | | [Fuyu-Heavy](https://www.adept.ai/blog/adept-fuyu-heavy) | Adept AI | Released (2024) | Ranked 3rd best vision-action model behind GPT-4V and Gemini Ultra | Desktop, Browser |
73 | | [Gemini 1.5 Pro (Tool Use)](https://ai.google.dev/gemini-api) | Google | Released | Long context, tool orchestration in Workspace | Browser |
74 | | [Google Mariner](https://deepmind.google/discover/blog/introducing-mariner-ai/) | Google DeepMind | Unreleased | High WebVoyager benchmark performance | Browser |
75 | | [Gumloop](https://www.gumloop.com/) | Gumloop | Released (2023) | Visual workflow canvas; 90+ pre-built templates; Chrome extension for web automation | Browser |
76 | | [Highlight AI](https://highlight.run/) | Embedded Intelligence | Released (2024) | Instant Q&A and automation on desktop; strong privacy focus | Desktop, Browser |
77 | | [Hyperbrowser](https://www.hyperbrowser.ai/) | Hyperbrowser.ai (YC Backed) | Released (2024) | Sub-second browser launch, 10,000+ concurrent browsers, CAPTCHA solving | Browser |
78 | | [Lindy](https://lindy.ai) | Lindy.ai | Released | Virtual AI assistant for daily business tasks | Browser |
79 | | [Manus](https://monica.im/) | Monica AI (China) | Released (2024) | World's first general AI agent; SOTA on GAIA benchmark | Desktop, Browser, Phone |
80 | | [MultiOn (now Please AI)](https://please.com/) | Please AI | Released (2023) | Multi-step web tasks end-to-end; preference learning | Browser |
81 | | [OpenAI CUA (Operator)](https://openai.com/index/cua-preview/) | OpenAI | Released (2025) | High benchmark performance; uses reasoning models tech | Browser, Desktop |
82 | | [Perplexity Comet](https://www.perplexity.ai/comet) | Perplexity AI | Upcoming (2025) | Autonomous multi-step search with citations | Browser |
83 | | [Project Jarvis](https://ai.google/discover/jarvis/) | Google | Rumored | Computer-using agent system; few details available | Desktop, Browser |
84 | | [Proxy](https://www.proxyagent.ai/) | Convergence AI | Released (2025) | Handles concurrent sub-tasks; cheaper alternative to Operator | Browser |
85 | | [Relay](https://relay.club/) | Relay.app | Released (2021) | Clean, simple interface; extensive app integrations | Browser |
86 | | [Relevance AI](https://relevance.ai/) | Relevance AI | Released | Drag-and-drop skill building, templates, integrations | Browser |
87 | | [ServiceNow AI Agents](https://www.servicenow.com/products/now-platform-ai.html) | ServiceNow | Released | Built-in governance, analytics, text-to-action capabilities | Browser |
88 | | [Vy](https://vercept.com/) | Vercept | Released (2025) | Advanced human-computer interaction; works with existing applications | Desktop |
89 |
90 | ## Open Source Agents
91 |
92 | | Name | Developer | License | Key Features | Environment |
93 | |------|-----------|---------|-------------|-------------|
94 | | [Agent S](https://github.com/simular-ai/Agent-S) | Simular AI | Research License | Web research, content summarization, data extraction | Browser, Desktop |
95 | | [Agent S2](https://github.com/simular-ai/Agent-S) | Simular AI | Research License | OSWorld: 34.5%; AndroidWorld: 50%; outperforms OpenAI CUA/Operator | Browser, Desktop, Phone |
96 | | [AutoGen](https://github.com/microsoft/autogen) | Microsoft | MIT | Agents can converse with each other to solve tasks | Browser |
97 | | [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT) | Significant Gravitas | MIT | Pioneer in autonomous GPT agents; self-prompting with memory | Browser |
98 | | [BabyAGI](https://github.com/yoheinakajima/babyagi) | Yohei Nakajima | MIT | Autonomous task creation and prioritization | Browser |
99 | | [Browser Use](https://browser-use.com/) | Y Combinator/ETH Zurich | Proprietary | Makes websites more digestible for AI agents | Browser |
100 | | [c/ua (Computer-Use Agent)](https://github.com/trycua/cua) | TryCua | Open Source | High-performance virtualization; fully isolated virtual environments | Desktop, Virtual Machine |
101 | | [CogAgent](https://arxiv.org/abs/2307.13854) | Tsinghua Univ. & Zhipu | Research License (CC BY-NC) | High-performance open model rivaling closed models | Desktop, Browser |
102 | | [CrewAI](https://github.com/joaomdmoura/crewAI) | CrewAI | Proprietary | Enables orchestration of specialized agents in teams | Browser |
103 | | [HyperAgent](https://github.com/FSoft-AI4Code/HyperAgent) | FSoft-AI4Code | Apache 2.0 | Handles GitHub issue resolution, repository-level code generation | Browser, Desktop |
104 | | [LangGraph](https://github.com/langchain-ai/langgraph) | LangChain | MIT | Framework for building stateful, multi-agent systems | Browser |
105 | | [LLM Agents](https://github.com/NVlabs/llm-agents) | NVIDIA/Meta | Research License | Standardized evaluation for LLM agents | Browser |
106 | | [Octo](https://octo-models.github.io/) | Google DeepMind | Apache 2.0 | Zero-shot generalization to new objects and tasks | Physical World |
107 | | [OpenInterpreter](https://github.com/OpenInterpreter/open-interpreter) | Open Interpreter | Proprietary | Code interpreter for local execution | Desktop, Browser |
108 | | [OWL](https://github.com/camel-ai/agent) | Camel-AI | Proprietary | Distributed task automation | Browser |
109 | | [RooCode](https://github.com/RooCode/RooCode) | Open-source | Proprietary | Autonomous coding in VS Code | Browser, Desktop |
110 | | [Simular AI](https://github.com/simular-ai/Agent-S) | Simular | Research License | SOTA on OSWorld and AndroidWorld benchmarks | Desktop, Browser, Phone |
111 | | [Suna](https://github.com/kortix-ai/suna) | Kortix | Proprietary | Highly versatile generalist agent; handles complex tasks | Browser |
112 | | [UI-TARS](https://arxiv.org/abs/2407.13063) | ByteDance/TikTok | Research License | Autonomous GUI execution on PC/Mac/Android | Browser, Desktop, Phone |
113 | | [Vercel AI SDK Computer Use](https://vercel.com/templates/next.js/ai-sdk-computer-use) | Vercel | Open Source | Standardized API for different AI models; streaming capabilities | Browser, Web |
114 | | [WebVoyager](https://arxiv.org/abs/2401.13919) | Hongliang He et al. | Research License | 59.1% success on 15-website benchmark | Browser |
115 | | [Felluo AI](https://felluo.ai) | Felluo | Proprietary | Vision-based GUI automation | Browser, Desktop |
116 |
117 | ## Research Projects
118 |
119 | | Name | Institution | Focus Area | Release Date |
120 | |------|------------|------------|-------------|
121 | | [Deep Research Agent](https://arxiv.org/abs/2307.13854) | OpenAI | Web browsing, research | 2024 |
122 | | [Gato](https://deepmind.google/discover/blog/a-generalist-agent/) | Google DeepMind | Multi-modal, multi-task, multi-embodiment | 2022 |
123 | | [HuggingGPT (Jarvis)](https://arxiv.org/abs/2303.17580) | Microsoft | Orchestrates specialists for multi-modal tasks | 2023 |
124 | | [I-AFM](https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model/) | Microsoft Research | Multi-modal, multi-task system | 2024 |
125 | | [Magma](https://microsoft.github.io/Magma/) | Microsoft Research | Vision-language-action model | 2025 |
126 | | [mlejva's Computer Agent](https://x.com/mlejva/status/1900582528093995279) | Vasek Mlejnsky | GUI interaction | 2024 |
127 | | [PaLM-E](https://palm-e.github.io/) | Google DeepMind & Robotics at Google | Embodied multimodal language model | 2023 |
128 | | [RT-2](https://robotics-transformer2.github.io/) | Google DeepMind | Vision-language-action model | 2023 |
129 | | [SayCan](https://arxiv.org/abs/2204.01691) | Google | Grounded language model for robotics | 2022 |
130 | | [SIMA](https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/) | Google DeepMind | 3D virtual environments | 2024 |
131 | | [WebAgent](https://arxiv.org/abs/2307.13854) | Google DeepMind | Autonomous web browsing and form-filling | 2024 |
132 |
133 | ## By Environment
134 |
135 | ### Browser-Based Agents
136 |
137 | Browser-based agents specialize in navigating and interacting with web interfaces:
138 |
139 | | Name | Developer | Status | Development Type |
140 | |------|-----------|--------|------------------|
141 | | [Hyperbrowser](https://www.hyperbrowser.ai/) | Hyperbrowser.ai (YC Backed) | Released | Commercial |
142 | | [Perplexity Comet](https://www.perplexity.ai/comet) | Perplexity AI | Upcoming | Commercial |
143 | | [Browser Use](https://browser-use.com/) | Y Combinator/ETH Zurich | Released | Commercial, Open-source |
144 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Commercial |
145 | | [Deep Research Agent](https://arxiv.org/abs/2307.13854) | OpenAI | Unreleased | Commercial, Research |
146 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Commercial |
147 | | [Google Mariner](https://deepmind.google/discover/blog/introducing-mariner-ai/) | Google DeepMind | Unreleased | Commercial, Research |
148 | | [Gumloop](https://www.gumloop.com/) | Gumloop | Released | Commercial |
149 | | [MultiOn (now Please AI)](https://please.com/) | Please AI | Released | Commercial |
150 | | [Proxy](https://www.proxyagent.ai/) | Convergence AI | Released | Commercial |
151 | | [Suna](https://github.com/kortix-ai/suna) | Kortix | Released | Open-source |
152 | | [WebVoyager](https://arxiv.org/abs/2401.13919) | Hongliang He et al. | Released | Research, Open-source |
153 |
154 | ### Desktop Agents
155 |
156 | Desktop agents interact with operating system GUIs and desktop applications:
157 |
158 | | Name | Developer | Status | Development Type |
159 | |------|-----------|--------|------------------|
160 | | [Ace](https://generalagents.com/ace/) | General Agents | Upcoming | Commercial, Research |
161 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released | Commercial |
162 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Commercial |
163 | | [Fuyu-Heavy](https://www.adept.ai/blog/adept-fuyu-heavy) | Adept AI | Released | Commercial, Research |
164 | | [Highlight AI](https://highlight.run/) | Embedded Intelligence | Released | Commercial |
165 | | [OpenAI CUA (Operator)](https://openai.com/index/cua-preview/) | OpenAI | Released | Commercial |
166 | | [Project Jarvis](https://ai.google/discover/jarvis/) | Google | Rumored | Commercial, Research |
167 | | [CogAgent](https://arxiv.org/abs/2307.13854) | Tsinghua Univ. & Zhipu | Released | Research, Open-source |
168 | | [Vy](https://vercept.com/) | Vercept | Released | Commercial |
169 | | [c/ua (Computer-Use Agent)](https://github.com/trycua/cua) | TryCua | Released | Open Source |
170 |
171 | ### Physical World Agents
172 |
173 | These agents operate in 3D environments, games, and physical systems:
174 |
175 | | Name | Developer | Status | Development Type |
176 | |------|-----------|--------|------------------|
177 | | [Gato](https://deepmind.google/discover/blog/a-generalist-agent/) | Google DeepMind | Released | Research |
178 | | [I-AFM](https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model/) | Microsoft Research | Released | Research |
179 | | [Magma](https://microsoft.github.io/Magma/) | Microsoft Research | Released | Research, Open Source |
180 | | [Octo](https://octo-models.github.io/) | Google DeepMind | Released | Open Source, Research |
181 | | [PaLM-E](https://palm-e.github.io/) | Google DeepMind & Robotics at Google | Released | Research |
182 | | [RT-2](https://robotics-transformer2.github.io/) | Google DeepMind | Released | Research |
183 | | [SayCan](https://arxiv.org/abs/2204.01691) | Google | Released | Research |
184 | | [SIMA](https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/) | Google DeepMind | Released | Research |
185 |
186 | ### Cloud Agents
187 |
188 | Cloud-based agents running in remote environments:
189 |
190 | | Name | Developer | Status | Development Type |
191 | |------|-----------|--------|------------------|
192 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Commercial |
193 |
194 | ### Multi-Device Agents
195 |
196 | Agents that can operate across multiple device types:
197 |
198 | | Name | Developer | Status | Supported Devices |
199 | |------|-----------|--------|-------------------|
200 | | [Agent S2](https://github.com/simular-ai/Agent-S) | Simular AI | Released | Windows, MacOS, Linux, Android, iOS |
201 | | [AskUI Vision Agent](https://www.askui.com/) | AskUI | Released | Windows, MacOS, Linux, Android, iOS |
202 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released | Windows, MacOS, Linux, Multi-device |
203 | | [Manus](https://monica.im/) | Monica AI (China) | Released | Windows, MacOS, Linux, Android, iOS |
204 | | [Simular AI](https://github.com/simular-ai/Agent-S) | Simular | Released | Windows, MacOS, Linux, Android, iOS |
205 | | [UI-TARS](https://arxiv.org/abs/2407.13063) | ByteDance/TikTok | Released | Windows, MacOS, Linux, Android, iOS |
206 |
207 | ## By Task Complexity
208 |
209 | *For a full breakdown of agents by task complexity, including Single Workflow, Multiple Workflow, and Complex Workflow Agents, please visit our website: [supernalintelligence.com](https://www.supernalintelligence.com/)*
210 |
211 | ## Resources
212 |
213 | ### Communities
214 |
215 | - [Supernal Intelligence Discord](https://discord.gg/J9pU82wP) - Join our community to discuss GUI agents, share resources, and connect with others
216 | - [X/Twitter: @supernalasi](https://x.com/supernalasi) - Follow for updates and news about GUI agents and AI advancements
217 | - [Website: supernalintelligence.com](https://www.supernalintelligence.com/) - Official website with more resources and information
218 |
219 | ### Related Awesome Lists
220 |
221 | - [Awesome AI Agent Leaderboards](https://github.com/supernalintelligence/Awesome-General-Agents-Leaderboard) - Comprehensive list of leaderboards for AI agents
222 | - [Awesome AI Agent Benchmarks](https://github.com/supernalintelligence/Awesome-General-Agents-Benchmark-) - Comprehensive list of benchmarks for AI agents
223 |
224 |
225 | ### Contribution
226 |
227 | Contributions welcome! Please read the [contribution guidelines](contributing.md) first or email [i@supernal.ai](mailto:i@supernal.ai) if you see an error or want to contribute.
228 |
229 | ## License
230 |
231 | This awesome list is maintained by [Parni](https://x.com/ParnianBrk) and [Ian](https://x.com/ian_derrington), and is released under the MIT Open Source License.
232 |
--------------------------------------------------------------------------------