Quick on the Uptake: Eliciting Implicit Intents from Human Demonstrations for Personalized Mobile-Use Agents

πŸ“… 2025-08-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing mobile usage agents model only explicit user intentions (e.g., procedural steps) while neglecting implicit intentions (e.g., personal preferences), resulting in limited personalization capability. This paper proposes IFRAgent, the first framework to systematically model both explicit and implicit intention flows. It constructs a user habit repository and a standardized operating procedure (SOP) vector library, enabling personalized query rewriting and fine-grained task decomposition. Methodologically, IFRAgent integrates demonstration learning, intention flow recognition, retrieval-augmented generation, and query rewriting to automate SOP extraction and capture granular behavioral habits. Evaluated on the MobileIAR benchmark, IFRAgent achieves a 6.79-percentage-point absolute improvement in intention alignment rate (+32.06% relative) and a 5.30-percentage-point gain in step completion rate (+26.34% relative), significantly enhancing agent–user intention consistency.

Technology Category

Application Category

πŸ“ Abstract
As multimodal large language models advance rapidly, the automation of mobile tasks has become increasingly feasible through the use of mobile-use agents that mimic human interactions from graphical user interface. To further enhance mobile-use agents, previous studies employ demonstration learning to improve mobile-use agents from human demonstrations. However, these methods focus solely on the explicit intention flows of humans (e.g., step sequences) while neglecting implicit intention flows (e.g., personal preferences), which makes it difficult to construct personalized mobile-use agents. In this work, to evaluate the extbf{I}ntention extbf{A}lignment extbf{R}ate between mobile-use agents and humans, we first collect extbf{MobileIAR}, a dataset containing human-intent-aligned actions and ground-truth actions. This enables a comprehensive assessment of the agents' understanding of human intent. Then we propose extbf{IFRAgent}, a framework built upon extbf{I}ntention extbf{F}low extbf{R}ecognition from human demonstrations. IFRAgent analyzes explicit intention flows from human demonstrations to construct a query-level vector library of standard operating procedures (SOP), and analyzes implicit intention flows to build a user-level habit repository. IFRAgent then leverages a SOP extractor combined with retrieval-augmented generation and a query rewriter to generate personalized query and SOP from a raw ambiguous query, enhancing the alignment between mobile-use agents and human intent. Experimental results demonstrate that IFRAgent outperforms baselines by an average of 6.79% (32.06% relative improvement) in human intention alignment rate and improves step completion rates by an average of 5.30% (26.34% relative improvement). The codes are available at https://github.com/MadeAgents/Quick-on-the-Uptake.
Problem

Research questions and friction points this paper is trying to address.

Enhancing mobile-use agents with implicit human intentions
Improving personalization in mobile task automation
Aligning agent actions with human intent flows
Innovation

Methods, ideas, or system contributions that make the work stand out.

Collects MobileIAR dataset for intent alignment
Proposes IFRAgent with intention flow recognition
Uses SOP extractor and query rewriter
πŸ”Ž Similar Papers
No similar papers found.
Z
Zheng Wu
School of Computer Science, Shanghai Jiao Tong University
Heyuan Huang
Heyuan Huang
Johns Hopkins University
Natural Language ProcessingMedical InformaticsMachine LearningMental Health
Y
Yanjia Yang
School of Computer Science, Shanghai Jiao Tong University
Y
Yuanyi Song
School of Computer Science, Shanghai Jiao Tong University
X
Xingyu Lou
OPPO Research Institute
Weiwen Liu
Weiwen Liu
Associate Professor, Shanghai Jiao Tong University
large language modelsAI agentsrecommender systems
W
Weinan Zhang
School of Computer Science, Shanghai Jiao Tong University
J
Jun Wang
OPPO Research Institute
Zhuosheng Zhang
Zhuosheng Zhang
Assistant Professor at Shanghai Jiao Tong University
Natural Language ProcessingLarge Language ModelsReasoningAI SafetyMulti-Agent Learning