├── images
├── test_img.png
└── InfiGUIAgent_logo.jpg
└── README.md
/images/test_img.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/InfiXAI/InfiGUIAgent/HEAD/images/test_img.png
--------------------------------------------------------------------------------
/images/InfiGUIAgent_logo.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/InfiXAI/InfiGUIAgent/HEAD/images/InfiGUIAgent_logo.jpg
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection
5 |
6 |
7 |
8 |
14 |
15 |
16 | This is the repo for the paper "[InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://huggingface.co/papers/2501.04575)". In this work, we develop a multimodal large language model-based GUI agent that enables enhanced task automation on computing devices. Our agent is trained through a two-stage supervised fine-tuning approach that focuses on fundamental GUI understanding skills and advanced reasoning capabilities, where we integrate hierarchical reasoning and expectation-reflection reasoning to enable native reasoning abilities in GUI interactions.
17 |
18 | ## 🔥 News
19 | - 🔥[2025/5/15] Our paper "[OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use](https://os-agent-survey.github.io/)" is accepted by *ACL 2025*.
20 | - 🔥[2025/4/19] Our paper "[InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners](https://arxiv.org/abs/2504.14239)" released. More information can be found in [the repository](https://github.com/Reallm-Labs/InfiGUI-R1).
21 | - 🔥[2025/1/9] Our paper "[InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://arxiv.org/abs/2501.04575)" released.
22 | - 🔥[2024/12/12] Our paper "[OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use](https://os-agent-survey.github.io/)" released.
23 | - [2024/4/2] Our paper "[InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks](https://infiagent.github.io/)" is accepted by *ICML 2024*.
24 |
25 | ## InfiGUIAgent
26 | We are in the process of uploading key artifacts from our paper to our 🤗 [Hugging Face Collection](https://huggingface.co/collections/Reallm-Labs/infiguiagent-67a4e4bdfbba9036a1700d97).
27 |
28 | Regarding the full model release, due to licensing restrictions on portions of our training data from third-party sources, we are currently sanitizing the dataset and retraining/refining the final model to ensure full compliance while maintaining performance.
29 |
30 | Stay tuned for updates! 🔜
31 |
--------------------------------------------------------------------------------