🤖 AI Summary
This work addresses the critical gap in existing mobile GUI agents, which typically optimize for task success rate or efficiency while overlooking the personalized nature of user privacy preferences—particularly for privacy-sensitive users whose interaction trajectories deviate from standard patterns. To bridge this gap, the study introduces Trajectory-Induced Preference Optimization (TIPO), a novel framework that formulates privacy preference modeling as a problem of structural heterogeneity in execution trajectories. TIPO integrates preference-strength weighting with a padding-gating mechanism to jointly align task execution and privacy personalization. Experimental results based on multimodal large language models demonstrate that the proposed approach achieves 65.60% task success rate, 46.22% compliance with privacy constraints, and 66.67% preference discriminability on a privacy-preference dataset, significantly outperforming current state-of-the-art methods.
📝 Abstract
Mobile GUI agents powered by Multimodal Large Language Models (MLLMs) can execute complex tasks on mobile devices. Despite this progress, most existing systems still optimize task success or efficiency, neglecting users' privacy personalization. In this paper, we study the often-overlooked problem of agent personalization. We observe that personalization can induce systematic structural heterogeneity in execution trajectories. For example, privacy-first users often prefer protective actions, e.g., refusing permissions, logging out, and minimizing exposure, leading to logically different execution trajectories from utility-first users. Such variable-length and structurally different trajectories make standard preference optimization unstable and less informative. To address this issue, we propose Trajectory Induced Preference Optimization (TIPO), which uses preference-intensity weighting to emphasize key privacy-related steps and padding gating to suppress alignment noise. Results on our Privacy Preference Dataset show that TIPO improves persona alignment and distinction while preserving strong task executability, achieving 65.60% SR, 46.22 Compliance, and 66.67% PD, outperforming existing optimization methods across various GUI tasks. The code and dataset will be publicly released at https://github.com/Zhixin-L/TIPO.