├── demo ├── demo_web_frontend │ ├── .gitattributes │ ├── public │ │ ├── logo.png │ │ └── favicon.ico │ ├── src │ │ ├── assets │ │ │ ├── head.png │ │ │ ├── logo.png │ │ │ ├── logo.svg │ │ │ ├── main.css │ │ │ └── base.css │ │ ├── router │ │ │ └── index.js │ │ ├── store.js │ │ ├── main.js │ │ ├── App.vue │ │ └── components │ │ │ ├── ScreenPage.vue │ │ │ ├── TaskPage.vue │ │ │ ├── ChatPage.vue │ │ │ └── ReasoningPage.vue │ ├── .prettierrc.json │ ├── jsconfig.json │ ├── .editorconfig │ ├── index.html │ ├── .gitignore │ ├── vite.config.js │ ├── README.md │ ├── eslint.config.js │ └── package.json ├── demo_agent_backend.py └── demo_web_backend.py ├── assets ├── GUI-KRB.webp ├── compare.webp ├── SPA-Bench.webp ├── web-demo.webp ├── AndroidWorld.webp └── GUI-explorer.webp ├── __init__.py ├── utils ├── __init__.py ├── prompt_templates.py ├── retrieval.py ├── utils.py ├── knowledge_generation.py ├── memory.py ├── embedding_pipeline.py └── device.py ├── MLLM_Agent ├── __init__.py └── json_action.py ├── requirements.txt ├── .env.example ├── LICENSE ├── .gitignore ├── README.md └── exploration_and_mining.py /demo/demo_web_frontend/.gitattributes: -------------------------------------------------------------------------------- 1 | * text=auto eol=lf 2 | -------------------------------------------------------------------------------- /assets/GUI-KRB.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/GUI-KRB.webp -------------------------------------------------------------------------------- /assets/compare.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/compare.webp -------------------------------------------------------------------------------- /assets/SPA-Bench.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/SPA-Bench.webp -------------------------------------------------------------------------------- /assets/web-demo.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/web-demo.webp -------------------------------------------------------------------------------- /assets/AndroidWorld.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/AndroidWorld.webp -------------------------------------------------------------------------------- /assets/GUI-explorer.webp: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/assets/GUI-explorer.webp -------------------------------------------------------------------------------- /demo/demo_web_frontend/public/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/demo/demo_web_frontend/public/logo.png -------------------------------------------------------------------------------- /demo/demo_web_frontend/public/favicon.ico: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/demo/demo_web_frontend/public/favicon.ico -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/assets/head.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/demo/demo_web_frontend/src/assets/head.png -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/assets/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/JiuTian-VL/GUI-explorer/HEAD/demo/demo_web_frontend/src/assets/logo.png -------------------------------------------------------------------------------- /demo/demo_web_frontend/.prettierrc.json: -------------------------------------------------------------------------------- 1 | { 2 | "$schema": "https://json.schemastore.org/prettierrc", 3 | "semi": false, 4 | "singleQuote": true, 5 | "printWidth": 100 6 | } 7 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/jsconfig.json: -------------------------------------------------------------------------------- 1 | { 2 | "compilerOptions": { 3 | "paths": { 4 | "@/*": ["./src/*"] 5 | } 6 | }, 7 | "exclude": ["node_modules", "dist"] 8 | } 9 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/router/index.js: -------------------------------------------------------------------------------- 1 | import { createRouter, createWebHistory } from 'vue-router' 2 | 3 | const router = createRouter({ 4 | history: createWebHistory(import.meta.env.BASE_URL), 5 | routes: [ 6 | ], 7 | }) 8 | 9 | export default router 10 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/store.js: -------------------------------------------------------------------------------- 1 | import { defineStore } from 'pinia' 2 | export const useGlobalStore = defineStore('global', { 3 | state: () => ({ runningTask: false ,controller: null,}), 4 | actions: { toggleRunningTask() { this.runningTask = !this.runningTask } } 5 | }) -------------------------------------------------------------------------------- /demo/demo_web_frontend/.editorconfig: -------------------------------------------------------------------------------- 1 | [*.{js,jsx,mjs,cjs,ts,tsx,mts,cts,vue,css,scss,sass,less,styl}] 2 | charset = utf-8 3 | indent_size = 2 4 | indent_style = space 5 | insert_final_newline = true 6 | trim_trailing_whitespace = true 7 | 8 | end_of_line = lf 9 | max_line_length = 100 10 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dotenv import load_dotenv 3 | 4 | # 获取 .env 文件的路径 5 | dotenv_path = os.path.join(os.path.dirname(__file__), ".env") 6 | assert os.path.exists(dotenv_path), f"{dotenv_path} not found" 7 | load_dotenv(dotenv_path=dotenv_path, verbose=True, override=True) 8 | print("环境变量已加载") 9 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/assets/logo.svg: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dotenv import load_dotenv 3 | 4 | # 获取 .env 文件的路径,为当前文件夹的父目录 5 | dotenv_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), ".env") 6 | assert os.path.exists(dotenv_path), f"{dotenv_path} not found" 7 | load_dotenv(dotenv_path=dotenv_path, verbose=True, override=True) 8 | print("环境变量已加载") 9 | -------------------------------------------------------------------------------- /MLLM_Agent/__init__.py: -------------------------------------------------------------------------------- 1 | import os 2 | from dotenv import load_dotenv 3 | 4 | # 获取 .env 文件的路径,为当前文件夹的父目录 5 | dotenv_path = os.path.join(os.path.dirname(os.path.dirname(__file__)), ".env") 6 | assert os.path.exists(dotenv_path), f"{dotenv_path} not found" 7 | load_dotenv(dotenv_path=dotenv_path, verbose=True, override=True) 8 | print("环境变量已加载") 9 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | androguard>=4 2 | python-dotenv 3 | Pillow 4 | uiautomator2>=3.2.5 5 | 6 | torch>=2.4.0 7 | torchvision>=0.19.0 8 | torchaudio>=2.4.0 9 | 10 | transformers 11 | sentencepiece 12 | protobuf 13 | faiss-cpu 14 | opencv-python-headless 15 | opencv-contrib-python-headless 16 | ImageHash 17 | gradio 18 | zstd 19 | nest-asyncio 20 | aiohttp 21 | 22 | fastapi 23 | uvicorn[standard] 24 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | GUI Agent Demo 10 | 11 | 12 | 13 |
14 | 15 | 16 | 17 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/main.js: -------------------------------------------------------------------------------- 1 | import './assets/main.css' 2 | 3 | import { createApp } from 'vue' 4 | import App from './App.vue' 5 | import router from './router' 6 | 7 | import Antd from 'ant-design-vue'; 8 | import 'ant-design-vue/dist/reset.css'; 9 | 10 | import { createPinia } from 'pinia' 11 | 12 | const app = createApp(App) 13 | 14 | app.use(Antd) 15 | app.use(router) 16 | 17 | const pinia = createPinia() 18 | app.use(pinia) 19 | 20 | app.mount('#app') 21 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/.gitignore: -------------------------------------------------------------------------------- 1 | # Logs 2 | logs 3 | *.log 4 | npm-debug.log* 5 | yarn-debug.log* 6 | yarn-error.log* 7 | pnpm-debug.log* 8 | lerna-debug.log* 9 | 10 | node_modules 11 | .DS_Store 12 | dist 13 | dist-ssr 14 | coverage 15 | *.local 16 | 17 | /cypress/videos/ 18 | /cypress/screenshots/ 19 | 20 | # Editor directories and files 21 | .vscode/* 22 | !.vscode/extensions.json 23 | .idea 24 | *.suo 25 | *.ntvs* 26 | *.njsproj 27 | *.sln 28 | *.sw? 29 | 30 | *.tsbuildinfo 31 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/vite.config.js: -------------------------------------------------------------------------------- 1 | import { fileURLToPath, URL } from 'node:url' 2 | 3 | import { defineConfig } from 'vite' 4 | import vue from '@vitejs/plugin-vue' 5 | import vueDevTools from 'vite-plugin-vue-devtools' 6 | 7 | // https://vite.dev/config/ 8 | export default defineConfig({ 9 | plugins: [ 10 | vue(), 11 | vueDevTools(), 12 | ], 13 | resolve: { 14 | alias: { 15 | '@': fileURLToPath(new URL('./src', import.meta.url)) 16 | }, 17 | }, 18 | }) 19 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/assets/main.css: -------------------------------------------------------------------------------- 1 | @import './base.css'; 2 | 3 | /* #app { 4 | max-width: 1280px; 5 | margin: 0 auto; 6 | padding: 2rem; 7 | font-weight: normal; 8 | } */ 9 | 10 | a, 11 | .green { 12 | text-decoration: none; 13 | color: hsla(160, 100%, 37%, 1); 14 | transition: 0.4s; 15 | padding: 3px; 16 | } 17 | 18 | @media (hover: hover) { 19 | a:hover { 20 | background-color: hsla(160, 100%, 37%, 0.2); 21 | } 22 | } 23 | 24 | /* @media (min-width: 1024px) { 25 | body { 26 | display: flex; 27 | place-items: center; 28 | } 29 | 30 | #app { 31 | display: grid; 32 | grid-template-columns: 1fr 1fr; 33 | padding: 0 2rem; 34 | } 35 | } */ 36 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/README.md: -------------------------------------------------------------------------------- 1 | # vue-project 2 | 3 | This template should help get you started developing with Vue 3 in Vite. 4 | 5 | ## Recommended IDE Setup 6 | 7 | [VSCode](https://code.visualstudio.com/) + [Volar](https://marketplace.visualstudio.com/items?itemName=Vue.volar) (and disable Vetur). 8 | 9 | ## Customize configuration 10 | 11 | See [Vite Configuration Reference](https://vite.dev/config/). 12 | 13 | ## Project Setup 14 | 15 | ```sh 16 | pnpm install 17 | ``` 18 | 19 | ### Compile and Hot-Reload for Development 20 | 21 | ```sh 22 | pnpm dev 23 | ``` 24 | 25 | ### Compile and Minify for Production 26 | 27 | ```sh 28 | pnpm build 29 | ``` 30 | 31 | ### Lint with [ESLint](https://eslint.org/) 32 | 33 | ```sh 34 | pnpm lint 35 | ``` 36 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/eslint.config.js: -------------------------------------------------------------------------------- 1 | import { defineConfig, globalIgnores } from 'eslint/config' 2 | import globals from 'globals' 3 | import js from '@eslint/js' 4 | import pluginVue from 'eslint-plugin-vue' 5 | import pluginOxlint from 'eslint-plugin-oxlint' 6 | import skipFormatting from '@vue/eslint-config-prettier/skip-formatting' 7 | 8 | export default defineConfig([ 9 | { 10 | name: 'app/files-to-lint', 11 | files: ['**/*.{js,mjs,jsx,vue}'], 12 | }, 13 | 14 | globalIgnores(['**/dist/**', '**/dist-ssr/**', '**/coverage/**']), 15 | 16 | { 17 | languageOptions: { 18 | globals: { 19 | ...globals.browser, 20 | }, 21 | }, 22 | }, 23 | 24 | js.configs.recommended, 25 | ...pluginVue.configs['flat/essential'], 26 | ...pluginOxlint.configs['flat/recommended'], 27 | skipFormatting, 28 | ]) 29 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/package.json: -------------------------------------------------------------------------------- 1 | { 2 | "name": "vue-project", 3 | "version": "0.0.0", 4 | "private": true, 5 | "type": "module", 6 | "scripts": { 7 | "dev": "vite", 8 | "build": "vite build", 9 | "preview": "vite preview", 10 | "lint:oxlint": "oxlint . --fix -D correctness --ignore-path .gitignore", 11 | "lint:eslint": "eslint . --fix", 12 | "lint": "run-s lint:*", 13 | "format": "prettier --write src/" 14 | }, 15 | "dependencies": { 16 | "ant-design-vue": "4.x", 17 | "pinia": "~3.0.1", 18 | "vue": "3.5.13", 19 | "vue-router": "~4.5.0" 20 | }, 21 | "devDependencies": { 22 | "@eslint/js": "^9.22.0", 23 | "@vitejs/plugin-vue": "^5.2.3", 24 | "@vue/eslint-config-prettier": "^10.2.0", 25 | "eslint": "^9.22.0", 26 | "eslint-plugin-oxlint": "^0.16.0", 27 | "eslint-plugin-vue": "~10.0.0", 28 | "globals": "^16.0.0", 29 | "npm-run-all2": "^7.0.2", 30 | "oxlint": "^0.16.0", 31 | "prettier": "3.5.3", 32 | "vite": "^6.2.5", 33 | "vite-plugin-vue-devtools": "^7.7.2" 34 | } 35 | } 36 | -------------------------------------------------------------------------------- /.env.example: -------------------------------------------------------------------------------- 1 | OPENAI_API_KEY="" 2 | OPENAI_BASE_URL="https://api.openai.com/v1" 3 | 4 | OPENAI_API_MODEL="gpt-4o" 5 | 6 | HF_ENDPOINT = "https://hf-mirror.com" 7 | #HTTP_PROXY="http://127.0.0.1:1080" 8 | 9 | # 如果版本升级,是否清空知识库重新生成 10 | EMPTY_KNOWLEDGE_BASE_WHEN_VERSION_UPGRADE = "False" 11 | # 如果版本降级,是否清空知识库重新生成 12 | EMPTY_KNOWLEDGE_BASE_WHEN_VERSION_DOWNGRADE = "True" 13 | 14 | KNOWLEDGE_BASE_ABSOLUTE_ROOT_PATH = "./knowledge_base" 15 | 16 | SERVER_EMBEDDING_DEVICE="cuda" # "cuda" 或者 "cpu" 或者 "mps" 17 | SERVER_EMBEDDING_ENDPOINT="http://127.0.0.1:8765" # 用于 embedding 服务的地址,详见 utils\embedding_pipeline.py 18 | CLIENT_EMBEDDING_DEVICE="server" # "cuda" 或者 "cpu" 或者 "mps" 或者 "server" , 如果填写 "server" 则需要运行 python -m utils.embedding_pipeline 19 | 20 | TURN_ON_DEMO_MODE="True" # 是否开启 demo 模式 21 | MESSAGE_SERVER_ENDPOINT="http://127.0.0.1:8768" # see demo/demo_web_backend.py 22 | DEMO_BACKEND_ENDPOINT="http://127.0.0.1:8767" # see demo/demo_agent_backend.py 23 | RAG_SERVER_ENDPOINT="http://127.0.0.1:8769" # see utils/retrieval.py 24 | 25 | LOW_RESOLUTION="False" # 是否开启低分辨率模式,开启后会将屏幕截图降低分辨率,减少Token数量 26 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 xieincz 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | knowledge_base 2 | exploration_output 3 | 4 | .env 5 | screenshot 6 | results 7 | config.yaml 8 | tmp 9 | *_tmp 10 | tmp* 11 | screenlog* 12 | * copy* 13 | .ruff_cache 14 | screenshot_* 15 | temp_* 16 | *test* 17 | swift 18 | flagged 19 | 20 | 21 | # Created by .ignore support plugin (hsz.mobi) 22 | ### Python template 23 | # Byte-compiled / optimized / DLL files 24 | __pycache__/ 25 | *.py[cod] 26 | *$py.class 27 | 28 | # Distribution / packaging 29 | .Python 30 | env/ 31 | build/ 32 | develop-eggs/ 33 | dist/ 34 | downloads/ 35 | eggs/ 36 | .eggs/ 37 | lib/ 38 | lib64/ 39 | parts/ 40 | sdist/ 41 | var/ 42 | *.egg-info/ 43 | .installed.cfg 44 | *.egg 45 | 46 | # PyInstaller 47 | # Usually these files are written by a python script from a template 48 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 49 | *.manifest 50 | *.spec 51 | 52 | # Installer logs 53 | pip-log.txt 54 | pip-delete-this-directory.txt 55 | 56 | # Unit test / coverage reports 57 | htmlcov/ 58 | .tox/ 59 | .coverage 60 | .coverage.* 61 | .cache 62 | nosetests.xml 63 | coverage.xml 64 | *,cover 65 | 66 | # Translations 67 | *.mo 68 | *.pot 69 | 70 | # Django stuff: 71 | *.log 72 | 73 | # Sphinx documentation 74 | docs/_build/ 75 | 76 | # PyBuilder 77 | target/ 78 | .idea 79 | droidbot_output 80 | _site 81 | Gemfile.lock 82 | .DS_Store 83 | documents 84 | temp 85 | venv 86 | apks/ 87 | output/ 88 | debug.py 89 | -------------------------------------------------------------------------------- /demo/demo_agent_backend.py: -------------------------------------------------------------------------------- 1 | """ 2 | 启动 python -m utils.demo_agent_backend 3 | """ 4 | 5 | import os 6 | from MLLM_Agent.GUI_explorer import GUI_explorer 7 | 8 | os.environ["no_proxy"] = "localhost, 127.0.0.1/8, ::1" 9 | print("Agent Service") 10 | print("Loading Agent...") 11 | assert os.getenv("TURN_ON_DEMO_MODE", "False").lower() == "true" 12 | agent = GUI_explorer() 13 | 14 | from fastapi import FastAPI, Request 15 | from fastapi.middleware.cors import CORSMiddleware 16 | 17 | app = FastAPI() 18 | app.add_middleware( 19 | CORSMiddleware, 20 | allow_origins=["*"], 21 | allow_credentials=True, 22 | allow_methods=["*"], 23 | allow_headers=["*"], 24 | ) 25 | 26 | 27 | @app.post("/run_task") 28 | async def sent_massage(request: Request): 29 | """ 30 | ```js 31 | fetch('http://127.0.0.1:8767/run_task', { 32 | method: 'POST', 33 | headers: { 34 | 'accept': 'application/json', 35 | 'Content-Type': 'application/json' 36 | }, 37 | body: JSON.stringify({ 38 | "task_goal": "打开chrome浏览器", 39 | }) 40 | }) 41 | .then(response => response.text()) 42 | .then(data => console.log(data)) 43 | .catch(error => console.error(error)); 44 | ``` 45 | """ 46 | # 从请求中解析原始 JSON 47 | massage = await request.json() 48 | print(f"Received message: {str(massage)[:30]} ...") 49 | agent.early_stop = False 50 | agent.run(massage["task_goal"]) 51 | agent.early_stop = False 52 | return "success" 53 | 54 | 55 | @app.post("/stop") 56 | async def sent_massage2(request: Request): 57 | massage = await request.json() 58 | print(f"Received message: {str(massage)[:30]} ...") 59 | agent.early_stop = True 60 | return "success" 61 | 62 | 63 | if __name__ == "__main__": 64 | import uvicorn 65 | 66 | uvicorn.run(app, host="0.0.0.0", port=8767, timeout_graceful_shutdown=3) 67 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/assets/base.css: -------------------------------------------------------------------------------- 1 | /* color palette from */ 2 | :root { 3 | --vt-c-white: #ffffff; 4 | --vt-c-white-soft: #f8f8f8; 5 | --vt-c-white-mute: #f2f2f2; 6 | 7 | --vt-c-black: #181818; 8 | --vt-c-black-soft: #222222; 9 | --vt-c-black-mute: #282828; 10 | 11 | --vt-c-indigo: #2c3e50; 12 | 13 | --vt-c-divider-light-1: rgba(60, 60, 60, 0.29); 14 | --vt-c-divider-light-2: rgba(60, 60, 60, 0.12); 15 | --vt-c-divider-dark-1: rgba(84, 84, 84, 0.65); 16 | --vt-c-divider-dark-2: rgba(84, 84, 84, 0.48); 17 | 18 | --vt-c-text-light-1: var(--vt-c-indigo); 19 | --vt-c-text-light-2: rgba(60, 60, 60, 0.66); 20 | --vt-c-text-dark-1: var(--vt-c-white); 21 | --vt-c-text-dark-2: rgba(235, 235, 235, 0.64); 22 | } 23 | 24 | /* semantic color variables for this project */ 25 | :root { 26 | --color-background: var(--vt-c-white); 27 | --color-background-soft: var(--vt-c-white-soft); 28 | --color-background-mute: var(--vt-c-white-mute); 29 | 30 | --color-border: var(--vt-c-divider-light-2); 31 | --color-border-hover: var(--vt-c-divider-light-1); 32 | 33 | --color-heading: var(--vt-c-text-light-1); 34 | --color-text: var(--vt-c-text-light-1); 35 | 36 | --section-gap: 160px; 37 | } 38 | 39 | @media (prefers-color-scheme: dark) { 40 | :root { 41 | --color-background: var(--vt-c-black); 42 | --color-background-soft: var(--vt-c-black-soft); 43 | --color-background-mute: var(--vt-c-black-mute); 44 | 45 | --color-border: var(--vt-c-divider-dark-2); 46 | --color-border-hover: var(--vt-c-divider-dark-1); 47 | 48 | --color-heading: var(--vt-c-text-dark-1); 49 | --color-text: var(--vt-c-text-dark-2); 50 | } 51 | } 52 | 53 | *, 54 | *::before, 55 | *::after { 56 | box-sizing: border-box; 57 | margin: 0; 58 | font-weight: normal; 59 | } 60 | 61 | body { 62 | /*min-height: 100vh;*/ 63 | color: var(--color-text); 64 | background: var(--color-background); 65 | transition: 66 | color 0.5s, 67 | background-color 0.5s; 68 | line-height: 1.6; 69 | font-family: 70 | Inter, 71 | -apple-system, 72 | BlinkMacSystemFont, 73 | 'Segoe UI', 74 | Roboto, 75 | Oxygen, 76 | Ubuntu, 77 | Cantarell, 78 | 'Fira Sans', 79 | 'Droid Sans', 80 | 'Helvetica Neue', 81 | sans-serif; 82 | font-size: 15px; 83 | text-rendering: optimizeLegibility; 84 | -webkit-font-smoothing: antialiased; 85 | -moz-osx-font-smoothing: grayscale; 86 | } 87 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/App.vue: -------------------------------------------------------------------------------- 1 | 107 | 108 | 122 | 123 | 168 | -------------------------------------------------------------------------------- /MLLM_Agent/json_action.py: -------------------------------------------------------------------------------- 1 | # Copyright 2024 The android_world Authors. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | 15 | """Represents an action for Android interaction, parsed from a JSON format.""" 16 | 17 | import dataclasses 18 | import json 19 | from typing import Optional 20 | 21 | 22 | _JSON_SEPARATORS = (',', ':') 23 | 24 | ANSWER = 'answer' 25 | CLICK = 'click' 26 | DOUBLE_TAP = 'double_tap' 27 | INPUT_TEXT = 'input_text' 28 | KEYBOARD_ENTER = 'keyboard_enter' 29 | LONG_PRESS = 'long_press' 30 | NAVIGATE_BACK = 'navigate_back' 31 | NAVIGATE_HOME = 'navigate_home' 32 | OPEN_APP = 'open_app' 33 | SCROLL = 'scroll' 34 | STATUS = 'status' 35 | SWIPE = 'swipe' 36 | UNKNOWN = 'unknown' 37 | WAIT = 'wait' 38 | 39 | _ACTION_TYPES = ( 40 | CLICK, 41 | DOUBLE_TAP, 42 | SCROLL, 43 | SWIPE, 44 | INPUT_TEXT, 45 | NAVIGATE_HOME, 46 | NAVIGATE_BACK, 47 | KEYBOARD_ENTER, 48 | OPEN_APP, 49 | STATUS, 50 | WAIT, 51 | LONG_PRESS, 52 | ANSWER, 53 | UNKNOWN, 54 | ) 55 | 56 | _SCROLL_DIRECTIONS = ('left', 'right', 'down', 'up') 57 | 58 | # Keys of JSON action. 59 | ACTION_TYPE = 'action_type' 60 | INDEX = 'index' 61 | X = 'x' 62 | Y = 'y' 63 | TEXT = 'text' 64 | DIRECTION = 'direction' 65 | APP_NAME = 'app_name' 66 | GOAL_STATUS = 'goal_status' 67 | 68 | 69 | @dataclasses.dataclass() 70 | class JSONAction: 71 | """Represents a parsed JSON action. 72 | 73 | # Example 74 | result_json = {'action_type': 'click', 'x': %d, 'y': %d} 75 | action = JSONAction(**result_json) 76 | 77 | Attributes: 78 | action_type: The action type. 79 | index: The index to click, if action is a click. Either an index or a 80 | should be provided. See x, y attributes below. 81 | x: The x position to click, if the action is a click. 82 | y: The y position to click, if the action is a click. 83 | text: The text to type, if action is type. 84 | direction: The direction to scroll, if action is scroll. 85 | goal_status: If the status is a 'status' type, indicates the status of the 86 | goal. 87 | app_name: The app name to launch, if the action type is 'open_app'. 88 | """ 89 | 90 | action_type: Optional[str] = None 91 | index: Optional[str | int] = None 92 | x: Optional[int] = None 93 | y: Optional[int] = None 94 | text: Optional[str] = None 95 | direction: Optional[str] = None 96 | goal_status: Optional[str] = None 97 | app_name: Optional[str] = None 98 | 99 | def __post_init__(self): 100 | if self.action_type not in _ACTION_TYPES: 101 | raise ValueError(f'Invalid action type: {self.action_type}') 102 | if self.index is not None: 103 | self.index = int(self.index) 104 | if self.x is not None or self.y is not None: 105 | raise ValueError('Either an index or a should be provided.') 106 | if self.direction and self.direction not in _SCROLL_DIRECTIONS: 107 | raise ValueError(f'Invalid scroll direction: {self.direction}') 108 | if self.text is not None and not isinstance(self.text, str): 109 | self.text = str(self.text) 110 | 111 | def __repr__(self) -> str: 112 | properties = [] 113 | for key, value in self.__dict__.items(): 114 | if value is not None: 115 | if isinstance(value, float): 116 | value = f'{value:.3f}' 117 | properties.append(f'{key}={value!r}') 118 | return f"JSONAction({', '.join(properties)})" 119 | 120 | def __eq__(self, other): 121 | if isinstance(other, JSONAction): 122 | return _compare_actions(self, other) 123 | return False 124 | 125 | def __ne__(self, other): 126 | return not self.__eq__(other) 127 | 128 | def json_str(self) -> str: 129 | non_null = {} 130 | for key, value in self.__dict__.items(): 131 | if value is not None: 132 | non_null[key] = value 133 | return json.dumps(non_null, separators=_JSON_SEPARATORS) 134 | 135 | 136 | def _compare_actions(a: JSONAction, b: JSONAction) -> bool: 137 | """Compares two JSONActions. 138 | 139 | Args: 140 | a: The first action. 141 | b: The second action. 142 | 143 | Returns: 144 | If the actions are equal. 145 | """ 146 | # Ignore cases. 147 | if a.app_name is not None and b.app_name is not None: 148 | app_name_match = a.app_name.lower() == b.app_name.lower() 149 | else: 150 | app_name_match = a.app_name == b.app_name 151 | 152 | if a.text is not None and b.text is not None: 153 | text_match = a.text.lower() == b.text.lower() 154 | else: 155 | text_match = a.text == b.text 156 | 157 | # Compare the non-metadata fields. 158 | return ( 159 | app_name_match 160 | and text_match 161 | and a.action_type == b.action_type 162 | and a.index == b.index 163 | and a.x == b.x 164 | and a.y == b.y 165 | and a.direction == b.direction 166 | and a.goal_status == b.goal_status 167 | ) 168 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/components/ScreenPage.vue: -------------------------------------------------------------------------------- 1 | 31 | 190 | 191 | -------------------------------------------------------------------------------- /demo/demo_web_frontend/src/components/TaskPage.vue: -------------------------------------------------------------------------------- 1 | 18 | 19 | 187 | 188 | 203 | -------------------------------------------------------------------------------- /demo/demo_web_backend.py: -------------------------------------------------------------------------------- 1 | """ 2 | 启动 python -m utils.demo_web_backend 3 | """ 4 | 5 | import os 6 | import requests 7 | 8 | 9 | def send_message(message: dict = None, text: str = None, images: list[str] = None): 10 | url = ( 11 | os.getenv("MESSAGE_SERVER_ENDPOINT", "http://127.0.0.1:8768") 12 | + "/sent_a_massage" 13 | ) 14 | rsp, ret = None, None 15 | try: 16 | try: 17 | _message = message if message else {} 18 | if text: 19 | _message["text"] = text 20 | if images: 21 | _message["images"] = images 22 | if len(_message.keys()) == 0: 23 | return None 24 | rsp = requests.post(url, json=_message) 25 | ret = rsp.json() 26 | except: 27 | ret = rsp.text 28 | except: 29 | return ret 30 | 31 | 32 | def send_message2(message: dict = None, text: str = None, images: list[str] = None): 33 | url = ( 34 | os.getenv("MESSAGE_SERVER_ENDPOINT", "http://127.0.0.1:8768") 35 | + "/sent_a_massage2" 36 | ) 37 | rsp, ret = None, None 38 | try: 39 | try: 40 | _message = message if message else {} 41 | if text: 42 | _message["text"] = text 43 | if images: 44 | _message["images"] = images 45 | if len(_message.keys()) == 0: 46 | return None 47 | rsp = requests.post(url, json=_message) 48 | ret = rsp.json() 49 | except: 50 | ret = rsp.text 51 | except: 52 | return ret 53 | 54 | 55 | def get_a_message3(): 56 | url = ( 57 | os.getenv("MESSAGE_SERVER_ENDPOINT", "http://127.0.0.1:8768") 58 | + "/get_a_massage3" 59 | ) 60 | rsp = requests.get(url) 61 | try: 62 | try: 63 | return rsp.json() 64 | except: 65 | return rsp.text 66 | except: 67 | return None 68 | 69 | 70 | def is_need_stop() -> bool: 71 | msg = str(get_a_message3()).lower() 72 | return "stop" in msg 73 | 74 | 75 | from multiprocessing import Queue 76 | from queue import Empty as QueueEmpty 77 | from fastapi import FastAPI, Request 78 | from fastapi.middleware.cors import CORSMiddleware 79 | 80 | app = FastAPI() 81 | app.add_middleware( 82 | CORSMiddleware, 83 | allow_origins=["*"], 84 | allow_credentials=True, 85 | allow_methods=["*"], 86 | allow_headers=["*"], 87 | ) 88 | from fastapi.middleware.gzip import GZipMiddleware 89 | 90 | app.add_middleware(GZipMiddleware, minimum_size=1000, compresslevel=6) 91 | 92 | massage_queue = None 93 | massage_queue2 = None 94 | massage_queue3 = None 95 | from utils.device import Device 96 | 97 | d = Device() 98 | 99 | from PIL import Image 100 | import io 101 | import base64 102 | 103 | 104 | @app.get("/get_a_massage") 105 | async def get_massage(): 106 | """ 107 | ```js 108 | fetch('http://127.0.0.1:8768/get_a_massage', { 109 | method: 'GET', 110 | headers: { 111 | 'accept': 'application/json', 112 | 'Content-Type': 'application/json' 113 | }}) 114 | .then(response => response.text()) 115 | .then(data => console.log(data)) 116 | .catch(error => console.error(error)); 117 | ``` 118 | """ 119 | try: 120 | msg = massage_queue.get_nowait() 121 | # print(f"Sent message: {str(msg)[:30]} ...") 122 | return msg 123 | except QueueEmpty: 124 | # print(f"Sent message: ") 125 | return "" 126 | 127 | 128 | @app.post("/sent_a_massage") 129 | async def sent_massage(request: Request): 130 | """ 131 | ```js 132 | fetch('http://127.0.0.1:8768/sent_a_massage', { 133 | method: 'POST', 134 | headers: { 135 | 'accept': 'application/json', 136 | 'Content-Type': 'application/json' 137 | }, 138 | body: JSON.stringify({ 139 | "data": "测试任意的json body", 140 | "image":"123" 141 | }) 142 | }) 143 | .then(response => response.text()) 144 | .then(data => console.log(data)) 145 | .catch(error => console.error(error)); 146 | ``` 147 | """ 148 | # 从请求中解析原始 JSON 149 | massage = await request.json() 150 | massage_queue.put(massage) 151 | print(f"Received message: {str(massage)[:30]} ...") 152 | return "success" 153 | 154 | 155 | @app.get("/get_a_massage2") 156 | async def get_massage2(): 157 | try: 158 | msg = massage_queue2.get_nowait() 159 | return msg 160 | except QueueEmpty: 161 | return "" 162 | 163 | 164 | @app.post("/sent_a_massage2") 165 | async def sent_massage2(request: Request): 166 | massage = await request.json() 167 | massage_queue2.put(massage) 168 | print(f"Received message: {str(massage)[:30]} ...") 169 | return "success" 170 | 171 | 172 | @app.get("/get_a_massage3") 173 | async def get_massage3(): 174 | try: 175 | msg = massage_queue3.get_nowait() 176 | return msg 177 | except QueueEmpty: 178 | return "" 179 | 180 | 181 | @app.post("/sent_a_massage3") 182 | async def sent_massage3(request: Request): 183 | massage = await request.json() 184 | massage_queue3.put(massage) 185 | print(f"Received message: {str(massage)[:30]} ...") 186 | return "success" 187 | 188 | 189 | @app.post("/reset") 190 | async def reset(request: Request): 191 | massage = await request.json() 192 | print(f"Received message: {str(massage)[:30]} ...") 193 | d.stop_all_apps() 194 | # d.home() 195 | return "success" 196 | 197 | 198 | from fastapi.responses import StreamingResponse 199 | 200 | 201 | @app.get("/get_screenshot") 202 | async def get_screenshot(): 203 | sc = d.get_screenshot() 204 | scale = 2.4 205 | sc = sc.resize((int(sc.size[0] / scale), int(sc.size[1] / scale)), Image.LANCZOS) 206 | # sc = sc.convert("RGB") 207 | buffered = io.BytesIO() 208 | sc.save(buffered, format="WEBP", quality=75) 209 | buffered.seek(0) 210 | return StreamingResponse(buffered, media_type="image/webp") 211 | 212 | 213 | if __name__ == "__main__": 214 | print("Fast API is starting") 215 | massage_queue = Queue() 216 | massage_queue2 = Queue() 217 | massage_queue3 = Queue() 218 | import uvicorn 219 | 220 | uvicorn.run(app, host="0.0.0.0", port=8768, timeout_graceful_shutdown=3) 221 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |
2 | 3 | 4 |

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

5 |
6 |
7 | Bin Xie1, 8 | Rui Shao1*, 9 | Gongwei Chen1*, 10 | Kaiwen Zhou2, 11 | Yinchuan Li2, 12 | Jie Liu1, 13 | Min Zhang1, 14 | Liqiang Nie1 15 |
16 | 17 | 1Harbin Institute of Technology, Shenzhen, 2Huawei Noah’s Ark Lab
18 | *Corresponding author 19 | 20 | Annual Meeting of the Association for Computational Linguistics (**ACL**) 2025 21 | 22 | [[Paper]](https://arxiv.org/abs/2505.16827) [[Code]](https://github.com/JiuTian-VL/GUI-explorer) [[Project Page]](https://xieincz.github.io/GUI-explorer.github.io/) 23 | 24 | :fire: Details will be released. Stay tuned :beers: :+1: 25 | 26 |
27 |
28 | 29 | ## If you find this work useful for your research, please kindly cite our paper and star our repo. 30 | 31 | ## Updates 32 | 33 | - [05/2025] [Project Page](https://xieincz.github.io/GUI-explorer.github.io/) released. 34 | - [05/2025] [Arxiv paper](https://arxiv.org/abs/2505.16827) released. 35 | - [05/2025] [Code](https://github.com/JiuTian-VL/GUI-explorer) released. 36 | 37 | ## Introduction 38 | 39 | This is the github repository of *GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent*. In this work, we propose GUI-explorer. It synergizes two key components: (1) Autonomous Exploration of Function-Aware Trajectory; (2) Unsupervised Mining of Transition-Aware Knowledge. 40 | 41 | The overview of the proposed GUI-explorer: 42 | 43 |
44 | 45 |
46 | 47 | ## Installation 48 | 49 | ### Download 50 | 51 | ```bash 52 | git clone https://github.com/JiuTian-VL/GUI-explorer.git 53 | cd GUI-explorer 54 | mkdir knowledge_base 55 | cd knowledge_base 56 | wget https://github.com/JiuTian-VL/GUI-explorer/releases/download/knowledge_base/knowledge_data.pkl 57 | ``` 58 | 59 | ### Environment 60 | 61 | ```bash 62 | cd GUI-explorer 63 | conda create -n GUI_explorer python=3.12 -y 64 | conda activate GUI_explorer 65 | pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126 66 | pip install -r requirements.txt 67 | ``` 68 | 69 | Duplicate `.env.example` and rename it to `.env`. Then, in the `.env` file, fill in your `OPENAI_API_KEY`. 70 | 71 | ## Usage 72 | 73 | ### Prepare api servers 74 | 75 | ```bash 76 | # Open a new shell window and run 77 | cd GUI-explorer 78 | conda activate GUI_explorer 79 | python -m utils.embedding_pipeline 80 | 81 | # Open a new shell window and run (Need to wait for embedding_pipeline to start up) 82 | cd GUI-explorer 83 | conda activate GUI_explorer 84 | python -m utils.retrieval 85 | ``` 86 | 87 | #### Exploration 88 | 89 | ```bash 90 | # After prepare api servers 91 | cd GUI-explorer 92 | conda activate GUI_explorer 93 | python exploration_and_mining.py -device_serial emulator-5554 -max_branching_factor 10 -max_exploration_steps 30 -max_exploration_depth 5 -package_name net.osmand 94 | # After the update of knowledge_base, you need to restart `python -m utils.retrieval` to load the new knowledge_base 95 | ``` 96 | 97 | `device_serial` can be obtained by running `adb devices`. (If not, you need to follow the `Setup` section in [this tutorial](https://github.com/ai-agents-2030/SPA-Bench/blob/main/Documentation.md#setup)). 98 | 99 | `package_name` can be obtained from the app's link on the app store. For example, in `https://play.google.com/store/apps/details?id=net.osmand`, `net.osmand` is the `package_name` for this app. 100 | 101 | #### Demo 102 | 103 | ```bash 104 | # After prepare api servers 105 | # Connect an Android device to this computer and make sure you can see it in `adb devices`. 106 | # Open a new shell window and run 107 | cd GUI-explorer 108 | conda activate GUI_explorer 109 | python -m demo.demo_web_backend 110 | 111 | # Open a new shell window and run 112 | cd GUI-explorer 113 | conda activate GUI_explorer 114 | python -m demo.demo_agent_backend 115 | 116 | # Open a new shell window and run 117 | cd GUI-explorer/demo/demo_web_frontend 118 | pnpm install 119 | pnpm run dev 120 | ``` 121 | 122 | Open http://localhost:5173 in your browser. 123 | 124 | You should be able to see something like this: 125 | 126 | ![web-demo](assets/web-demo.webp) 127 | 128 | 129 | 130 | ## Evaluation Results 131 | 132 | Table 1: Main Result of GUI-explorer on SPA-Bench single-app English Level 3 tasks. 133 | SPA-Bench 134 | 135 | Table 2: Main Result of GUI-explorer on AndroidWorld tasks. 136 | AndroidWorld 137 | 138 | Table 3: Main Result of GUI-explorer on GUI-KRB. 139 | GUI-KRB 140 | 141 | 142 | 143 | ## Showcases 144 | 145 | | Instruction | Video | 146 | | :----------------------------------------------------------: | :----------------------------------------------------------: | 147 | | Open Google Chrome and search for today's weather in Shenzhen. Carefully observe the screen and record the current weather conditions. Then, in Markor, create a note named "today.md" and write the temperature read from the webpage into it. |