├── images ├── test_img.png └── InfiGUIAgent_logo.jpg └── README.md /images/test_img.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InfiXAI/InfiGUIAgent/HEAD/images/test_img.png -------------------------------------------------------------------------------- /images/InfiGUIAgent_logo.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/InfiXAI/InfiGUIAgent/HEAD/images/InfiGUIAgent_logo.jpg -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 |

2 | 3 |
4 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 5 |

6 | 7 |
8 |

| 9 | 🏠 Homepage | 10 | 📚 Arxiv | 11 | 🤗 Paper | 12 | 🤗 Collection | 13 |

14 |
15 | 16 | This is the repo for the paper "[InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://huggingface.co/papers/2501.04575)". In this work, we develop a multimodal large language model-based GUI agent that enables enhanced task automation on computing devices. Our agent is trained through a two-stage supervised fine-tuning approach that focuses on fundamental GUI understanding skills and advanced reasoning capabilities, where we integrate hierarchical reasoning and expectation-reflection reasoning to enable native reasoning abilities in GUI interactions. 17 | 18 | ## 🔥 News 19 | - 🔥[2025/5/15] Our paper "[OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use](https://os-agent-survey.github.io/)" is accepted by *ACL 2025*. 20 | - 🔥[2025/4/19] Our paper "[InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners](https://arxiv.org/abs/2504.14239)" released. More information can be found in [the repository](https://github.com/Reallm-Labs/InfiGUI-R1). 21 | - 🔥[2025/1/9] Our paper "[InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection](https://arxiv.org/abs/2501.04575)" released. 22 | - 🔥[2024/12/12] Our paper "[OS Agents: A Survey on MLLM-based Agents for Computer, Phone and Browser Use](https://os-agent-survey.github.io/)" released. 23 | - [2024/4/2] Our paper "[InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks](https://infiagent.github.io/)" is accepted by *ICML 2024*. 24 | 25 | ## InfiGUIAgent 26 | We are in the process of uploading key artifacts from our paper to our 🤗 [Hugging Face Collection](https://huggingface.co/collections/Reallm-Labs/infiguiagent-67a4e4bdfbba9036a1700d97). 27 | 28 | Regarding the full model release, due to licensing restrictions on portions of our training data from third-party sources, we are currently sanitizing the dataset and retraining/refining the final model to ensure full compliance while maintaining performance. 29 | 30 | Stay tuned for updates! 🔜 31 | --------------------------------------------------------------------------------

2 | 3 | 4 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 5 |

2 | 3 |
4 | InfiGUIAgent: A Multimodal Generalist GUI Agent with Native Reasoning and Reflection 5 |