├── assets ├── LICENSE └── README.md /assets: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 supernalintelligence 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome GUI Compute Agents 2 | 3 | 4 |

5 | Supernal Intelligence Logo 6 |

7 | 8 | 9 |

10 | License: MIT 11 | Stars 12 |

13 | 14 |

15 | Follow on X 16 | Bluesky 17 | Website 18 |

19 | 20 | A curated list of GUI (Graphical User Interface) compute agents - AI systems that can see, understand, and interact with graphical interfaces like humans do. 21 | 22 | 23 | This project is maintained by [Parni](https://x.com/ParnianBrk) and [Ian](https://x.com/ian_derrington). Follow [Supernal Intelligence](https://x.com/supernalasi) for more updates. 24 | 25 | **Website**: [supernalintelligence.com](https://www.supernalintelligence.com/) 26 | **Join our Discord**: [Supernal Intelligence Discord](https://discord.gg/J9pU82wP) 27 | 28 | For more complete data and the latest information, please visit our website: [supernalintelligence.com](https://www.supernalintelligence.com/) 29 | 30 | 31 | 32 | ## What are GUI Compute Agents? 33 | 34 | GUI compute agents are AI systems designed to interact with graphical user interfaces just like humans do. They can: 35 | - See and understand screen elements 36 | - Click buttons, type text, and drag elements 37 | - Navigate through applications and websites 38 | - Complete complex visual workflows 39 | - Automate GUI-based tasks through natural language instructions 40 | 41 | ## Contents 42 | 43 | - [Commercial Agents](#commercial-agents) 44 | - [Open Source Agents](#open-source-agents) 45 | - [Research Projects](#research-projects) 46 | - [By Environment](#by-environment) 47 | - [Browser-Based Agents](#browser-based-agents) 48 | - [Desktop Agents](#desktop-agents) 49 | - [Physical World Agents](#physical-world-agents) 50 | - [Cloud Agents](#cloud-agents) 51 | - [Multi-Device Agents](#multi-device-agents) 52 | - [By Task Complexity](#by-task-complexity) 53 | - [Resources](#resources) 54 | 55 | ## Commercial Agents 56 | 57 | | Name | Developer | Status | Key Features | Environment | 58 | |------|-----------|--------|-------------|-------------| 59 | | [Ace](https://generalagents.com/ace/) | General Agents | Upcoming (2025) | Achieved 20× human speed on UI tasks; controls full computer via screen pixels | Desktop, Browser | 60 | | [ACT-1](https://www.adept.ai/blog/act-1) | Adept AI | Released (2022) | Pioneer in digital actions; self-correcting behavior | Desktop, Browser | 61 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Cloud-based GUI automation; enterprise-grade | Cloud, Browser | 62 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Vision-based GUI automation; supports both browser and desktop interactions | Desktop, Browser | 63 | | [Adaptive.AI](https://adaptive.ai/) | Adaptive AI Inc | Released | AI risk management framework; technology strategy consulting | Browser | 64 | | [AgentGPT](https://github.com/reworkd/AgentGPT) | Reworkd | Released (2023) | User-friendly interface for creating goal-oriented agents | Browser | 65 | | [AI Agent Studio](https://www.automationanywhere.com/products/process-discovery) | Automation Anywhere | Released (2025) | Handles structured and unstructured data; creates AI agents for enterprise automation | Browser | 66 | | [Apple Intelligence Agents](https://www.apple.com/apple-intelligence/) | Apple | Upcoming (2025) | Deep OS integration; privacy-focused | Phone, Desktop | 67 | | [AskUI Vision Agent](https://www.askui.com/) | AskUI | Released | Cross-platform functionality without virtual machines | Desktop, Browser, Phone | 68 | | [Beam AI](https://www.beam.cloud/) | Beam | Released | Agentic Process Automation platform for customer support, onboarding, sales proposal generation | Browser | 69 | | [Claude Agent Kit](https://www.anthropic.com/solutions/agents) | Anthropic | Upcoming (2024) | Official toolkit for building Claude-powered agents | Browser | 70 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released (2024) | Works on desktop apps and browsers; AI model-based approach | Browser, Desktop, Multi-device | 71 | | [Devin](https://cognition.dev) | Cognition Labs | Upcoming (2025) | Full-stack programming capabilities with browser access | Desktop, Browser | 72 | | [Fuyu-Heavy](https://www.adept.ai/blog/adept-fuyu-heavy) | Adept AI | Released (2024) | Ranked 3rd best vision-action model behind GPT-4V and Gemini Ultra | Desktop, Browser | 73 | | [Gemini 1.5 Pro (Tool Use)](https://ai.google.dev/gemini-api) | Google | Released | Long context, tool orchestration in Workspace | Browser | 74 | | [Google Mariner](https://deepmind.google/discover/blog/introducing-mariner-ai/) | Google DeepMind | Unreleased | High WebVoyager benchmark performance | Browser | 75 | | [Gumloop](https://www.gumloop.com/) | Gumloop | Released (2023) | Visual workflow canvas; 90+ pre-built templates; Chrome extension for web automation | Browser | 76 | | [Highlight AI](https://highlight.run/) | Embedded Intelligence | Released (2024) | Instant Q&A and automation on desktop; strong privacy focus | Desktop, Browser | 77 | | [Hyperbrowser](https://www.hyperbrowser.ai/) | Hyperbrowser.ai (YC Backed) | Released (2024) | Sub-second browser launch, 10,000+ concurrent browsers, CAPTCHA solving | Browser | 78 | | [Lindy](https://lindy.ai) | Lindy.ai | Released | Virtual AI assistant for daily business tasks | Browser | 79 | | [Manus](https://monica.im/) | Monica AI (China) | Released (2024) | World's first general AI agent; SOTA on GAIA benchmark | Desktop, Browser, Phone | 80 | | [MultiOn (now Please AI)](https://please.com/) | Please AI | Released (2023) | Multi-step web tasks end-to-end; preference learning | Browser | 81 | | [OpenAI CUA (Operator)](https://openai.com/index/cua-preview/) | OpenAI | Released (2025) | High benchmark performance; uses reasoning models tech | Browser, Desktop | 82 | | [Perplexity Comet](https://www.perplexity.ai/comet) | Perplexity AI | Upcoming (2025) | Autonomous multi-step search with citations | Browser | 83 | | [Project Jarvis](https://ai.google/discover/jarvis/) | Google | Rumored | Computer-using agent system; few details available | Desktop, Browser | 84 | | [Proxy](https://www.proxyagent.ai/) | Convergence AI | Released (2025) | Handles concurrent sub-tasks; cheaper alternative to Operator | Browser | 85 | | [Relay](https://relay.club/) | Relay.app | Released (2021) | Clean, simple interface; extensive app integrations | Browser | 86 | | [Relevance AI](https://relevance.ai/) | Relevance AI | Released | Drag-and-drop skill building, templates, integrations | Browser | 87 | | [ServiceNow AI Agents](https://www.servicenow.com/products/now-platform-ai.html) | ServiceNow | Released | Built-in governance, analytics, text-to-action capabilities | Browser | 88 | | [Vy](https://vercept.com/) | Vercept | Released (2025) | Advanced human-computer interaction; works with existing applications | Desktop | 89 | 90 | ## Open Source Agents 91 | 92 | | Name | Developer | License | Key Features | Environment | 93 | |------|-----------|---------|-------------|-------------| 94 | | [Agent S](https://github.com/simular-ai/Agent-S) | Simular AI | Research License | Web research, content summarization, data extraction | Browser, Desktop | 95 | | [Agent S2](https://github.com/simular-ai/Agent-S) | Simular AI | Research License | OSWorld: 34.5%; AndroidWorld: 50%; outperforms OpenAI CUA/Operator | Browser, Desktop, Phone | 96 | | [AutoGen](https://github.com/microsoft/autogen) | Microsoft | MIT | Agents can converse with each other to solve tasks | Browser | 97 | | [AutoGPT](https://github.com/Significant-Gravitas/Auto-GPT) | Significant Gravitas | MIT | Pioneer in autonomous GPT agents; self-prompting with memory | Browser | 98 | | [BabyAGI](https://github.com/yoheinakajima/babyagi) | Yohei Nakajima | MIT | Autonomous task creation and prioritization | Browser | 99 | | [Browser Use](https://browser-use.com/) | Y Combinator/ETH Zurich | Proprietary | Makes websites more digestible for AI agents | Browser | 100 | | [c/ua (Computer-Use Agent)](https://github.com/trycua/cua) | TryCua | Open Source | High-performance virtualization; fully isolated virtual environments | Desktop, Virtual Machine | 101 | | [CogAgent](https://arxiv.org/abs/2307.13854) | Tsinghua Univ. & Zhipu | Research License (CC BY-NC) | High-performance open model rivaling closed models | Desktop, Browser | 102 | | [CrewAI](https://github.com/joaomdmoura/crewAI) | CrewAI | Proprietary | Enables orchestration of specialized agents in teams | Browser | 103 | | [HyperAgent](https://github.com/FSoft-AI4Code/HyperAgent) | FSoft-AI4Code | Apache 2.0 | Handles GitHub issue resolution, repository-level code generation | Browser, Desktop | 104 | | [LangGraph](https://github.com/langchain-ai/langgraph) | LangChain | MIT | Framework for building stateful, multi-agent systems | Browser | 105 | | [LLM Agents](https://github.com/NVlabs/llm-agents) | NVIDIA/Meta | Research License | Standardized evaluation for LLM agents | Browser | 106 | | [Octo](https://octo-models.github.io/) | Google DeepMind | Apache 2.0 | Zero-shot generalization to new objects and tasks | Physical World | 107 | | [OpenInterpreter](https://github.com/OpenInterpreter/open-interpreter) | Open Interpreter | Proprietary | Code interpreter for local execution | Desktop, Browser | 108 | | [OWL](https://github.com/camel-ai/agent) | Camel-AI | Proprietary | Distributed task automation | Browser | 109 | | [RooCode](https://github.com/RooCode/RooCode) | Open-source | Proprietary | Autonomous coding in VS Code | Browser, Desktop | 110 | | [Simular AI](https://github.com/simular-ai/Agent-S) | Simular | Research License | SOTA on OSWorld and AndroidWorld benchmarks | Desktop, Browser, Phone | 111 | | [Suna](https://github.com/kortix-ai/suna) | Kortix | Proprietary | Highly versatile generalist agent; handles complex tasks | Browser | 112 | | [UI-TARS](https://arxiv.org/abs/2407.13063) | ByteDance/TikTok | Research License | Autonomous GUI execution on PC/Mac/Android | Browser, Desktop, Phone | 113 | | [Vercel AI SDK Computer Use](https://vercel.com/templates/next.js/ai-sdk-computer-use) | Vercel | Open Source | Standardized API for different AI models; streaming capabilities | Browser, Web | 114 | | [WebVoyager](https://arxiv.org/abs/2401.13919) | Hongliang He et al. | Research License | 59.1% success on 15-website benchmark | Browser | 115 | | [Felluo AI](https://felluo.ai) | Felluo | Proprietary | Vision-based GUI automation | Browser, Desktop | 116 | 117 | ## Research Projects 118 | 119 | | Name | Institution | Focus Area | Release Date | 120 | |------|------------|------------|-------------| 121 | | [Deep Research Agent](https://arxiv.org/abs/2307.13854) | OpenAI | Web browsing, research | 2024 | 122 | | [Gato](https://deepmind.google/discover/blog/a-generalist-agent/) | Google DeepMind | Multi-modal, multi-task, multi-embodiment | 2022 | 123 | | [HuggingGPT (Jarvis)](https://arxiv.org/abs/2303.17580) | Microsoft | Orchestrates specialists for multi-modal tasks | 2023 | 124 | | [I-AFM](https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model/) | Microsoft Research | Multi-modal, multi-task system | 2024 | 125 | | [Magma](https://microsoft.github.io/Magma/) | Microsoft Research | Vision-language-action model | 2025 | 126 | | [mlejva's Computer Agent](https://x.com/mlejva/status/1900582528093995279) | Vasek Mlejnsky | GUI interaction | 2024 | 127 | | [PaLM-E](https://palm-e.github.io/) | Google DeepMind & Robotics at Google | Embodied multimodal language model | 2023 | 128 | | [RT-2](https://robotics-transformer2.github.io/) | Google DeepMind | Vision-language-action model | 2023 | 129 | | [SayCan](https://arxiv.org/abs/2204.01691) | Google | Grounded language model for robotics | 2022 | 130 | | [SIMA](https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/) | Google DeepMind | 3D virtual environments | 2024 | 131 | | [WebAgent](https://arxiv.org/abs/2307.13854) | Google DeepMind | Autonomous web browsing and form-filling | 2024 | 132 | 133 | ## By Environment 134 | 135 | ### Browser-Based Agents 136 | 137 | Browser-based agents specialize in navigating and interacting with web interfaces: 138 | 139 | | Name | Developer | Status | Development Type | 140 | |------|-----------|--------|------------------| 141 | | [Hyperbrowser](https://www.hyperbrowser.ai/) | Hyperbrowser.ai (YC Backed) | Released | Commercial | 142 | | [Perplexity Comet](https://www.perplexity.ai/comet) | Perplexity AI | Upcoming | Commercial | 143 | | [Browser Use](https://browser-use.com/) | Y Combinator/ETH Zurich | Released | Commercial, Open-source | 144 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Commercial | 145 | | [Deep Research Agent](https://arxiv.org/abs/2307.13854) | OpenAI | Unreleased | Commercial, Research | 146 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Commercial | 147 | | [Google Mariner](https://deepmind.google/discover/blog/introducing-mariner-ai/) | Google DeepMind | Unreleased | Commercial, Research | 148 | | [Gumloop](https://www.gumloop.com/) | Gumloop | Released | Commercial | 149 | | [MultiOn (now Please AI)](https://please.com/) | Please AI | Released | Commercial | 150 | | [Proxy](https://www.proxyagent.ai/) | Convergence AI | Released | Commercial | 151 | | [Suna](https://github.com/kortix-ai/suna) | Kortix | Released | Open-source | 152 | | [WebVoyager](https://arxiv.org/abs/2401.13919) | Hongliang He et al. | Released | Research, Open-source | 153 | 154 | ### Desktop Agents 155 | 156 | Desktop agents interact with operating system GUIs and desktop applications: 157 | 158 | | Name | Developer | Status | Development Type | 159 | |------|-----------|--------|------------------| 160 | | [Ace](https://generalagents.com/ace/) | General Agents | Upcoming | Commercial, Research | 161 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released | Commercial | 162 | | [Felluo AI](https://felluo.ai) | Felluo | Released | Commercial | 163 | | [Fuyu-Heavy](https://www.adept.ai/blog/adept-fuyu-heavy) | Adept AI | Released | Commercial, Research | 164 | | [Highlight AI](https://highlight.run/) | Embedded Intelligence | Released | Commercial | 165 | | [OpenAI CUA (Operator)](https://openai.com/index/cua-preview/) | OpenAI | Released | Commercial | 166 | | [Project Jarvis](https://ai.google/discover/jarvis/) | Google | Rumored | Commercial, Research | 167 | | [CogAgent](https://arxiv.org/abs/2307.13854) | Tsinghua Univ. & Zhipu | Released | Research, Open-source | 168 | | [Vy](https://vercept.com/) | Vercept | Released | Commercial | 169 | | [c/ua (Computer-Use Agent)](https://github.com/trycua/cua) | TryCua | Released | Open Source | 170 | 171 | ### Physical World Agents 172 | 173 | These agents operate in 3D environments, games, and physical systems: 174 | 175 | | Name | Developer | Status | Development Type | 176 | |------|-----------|--------|------------------| 177 | | [Gato](https://deepmind.google/discover/blog/a-generalist-agent/) | Google DeepMind | Released | Research | 178 | | [I-AFM](https://www.microsoft.com/en-us/research/publication/interactive-agent-foundation-model/) | Microsoft Research | Released | Research | 179 | | [Magma](https://microsoft.github.io/Magma/) | Microsoft Research | Released | Research, Open Source | 180 | | [Octo](https://octo-models.github.io/) | Google DeepMind | Released | Open Source, Research | 181 | | [PaLM-E](https://palm-e.github.io/) | Google DeepMind & Robotics at Google | Released | Research | 182 | | [RT-2](https://robotics-transformer2.github.io/) | Google DeepMind | Released | Research | 183 | | [SayCan](https://arxiv.org/abs/2204.01691) | Google | Released | Research | 184 | | [SIMA](https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/) | Google DeepMind | Released | Research | 185 | 186 | ### Cloud Agents 187 | 188 | Cloud-based agents running in remote environments: 189 | 190 | | Name | Developer | Status | Development Type | 191 | |------|-----------|--------|------------------| 192 | | [CloudCruise](https://www.cloudcruise.com/) | CloudCruise | Released | Commercial | 193 | 194 | ### Multi-Device Agents 195 | 196 | Agents that can operate across multiple device types: 197 | 198 | | Name | Developer | Status | Supported Devices | 199 | |------|-----------|--------|-------------------| 200 | | [Agent S2](https://github.com/simular-ai/Agent-S) | Simular AI | Released | Windows, MacOS, Linux, Android, iOS | 201 | | [AskUI Vision Agent](https://www.askui.com/) | AskUI | Released | Windows, MacOS, Linux, Android, iOS | 202 | | [Claude Computer Use](https://www.anthropic.com/news/claude-computer-use) | Anthropic | Released | Windows, MacOS, Linux, Multi-device | 203 | | [Manus](https://monica.im/) | Monica AI (China) | Released | Windows, MacOS, Linux, Android, iOS | 204 | | [Simular AI](https://github.com/simular-ai/Agent-S) | Simular | Released | Windows, MacOS, Linux, Android, iOS | 205 | | [UI-TARS](https://arxiv.org/abs/2407.13063) | ByteDance/TikTok | Released | Windows, MacOS, Linux, Android, iOS | 206 | 207 | ## By Task Complexity 208 | 209 | *For a full breakdown of agents by task complexity, including Single Workflow, Multiple Workflow, and Complex Workflow Agents, please visit our website: [supernalintelligence.com](https://www.supernalintelligence.com/)* 210 | 211 | ## Resources 212 | 213 | ### Communities 214 | 215 | - [Supernal Intelligence Discord](https://discord.gg/J9pU82wP) - Join our community to discuss GUI agents, share resources, and connect with others 216 | - [X/Twitter: @supernalasi](https://x.com/supernalasi) - Follow for updates and news about GUI agents and AI advancements 217 | - [Website: supernalintelligence.com](https://www.supernalintelligence.com/) - Official website with more resources and information 218 | 219 | ### Related Awesome Lists 220 | 221 | - [Awesome AI Agent Leaderboards](https://github.com/supernalintelligence/Awesome-General-Agents-Leaderboard) - Comprehensive list of leaderboards for AI agents 222 | - [Awesome AI Agent Benchmarks](https://github.com/supernalintelligence/Awesome-General-Agents-Benchmark-) - Comprehensive list of benchmarks for AI agents 223 | 224 | 225 | ### Contribution 226 | 227 | Contributions welcome! Please read the [contribution guidelines](contributing.md) first or email [i@supernal.ai](mailto:i@supernal.ai) if you see an error or want to contribute. 228 | 229 | ## License 230 | 231 | This awesome list is maintained by [Parni](https://x.com/ParnianBrk) and [Ian](https://x.com/ian_derrington), and is released under the MIT Open Source License. 232 | --------------------------------------------------------------------------------