TransBench: Breaking Barriers for Transferable Graphical User Interface Agents in Dynamic Digital Environments

📅 2025-05-23

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Existing GUI agents exhibit poor transferability in dynamic digital environments, struggling to adapt to interface variations across app versions, platforms (iOS/Android/Web), and applications—leading to instruction grounding failures. Method: We introduce TransBench, the first systematic benchmark for evaluating GUI agent transferability, formally defining and quantifying transfer capability along three dimensions: version, platform, and application. It comprises a standardized test suite covering 15 mainstream application categories across multiple versions and platforms, and establishes a comprehensive evaluation protocol integrating multi-source GUI data collection, vision-language alignment, interface element abstraction, and cross-domain generalization assessment. Contribution/Results: Experiments demonstrate that TransBench significantly improves grounding accuracy and robustly validates agent performance under frequent UI updates and platform heterogeneity. It is the first benchmark to systematically address GUI transferability evaluation, thereby filling a critical gap in the field.

Technology Category

Application Category

📝 Abstract

Graphical User Interface (GUI) agents, which autonomously operate on digital interfaces through natural language instructions, hold transformative potential for accessibility, automation, and user experience. A critical aspect of their functionality is grounding - the ability to map linguistic intents to visual and structural interface elements. However, existing GUI agents often struggle to adapt to the dynamic and interconnected nature of real-world digital environments, where tasks frequently span multiple platforms and applications while also being impacted by version updates. To address this, we introduce TransBench, the first benchmark designed to systematically evaluate and enhance the transferability of GUI agents across three key dimensions: cross-version transferability (adapting to version updates), cross-platform transferability (generalizing across platforms like iOS, Android, and Web), and cross-application transferability (handling tasks spanning functionally distinct apps). TransBench includes 15 app categories with diverse functionalities, capturing essential pages across versions and platforms to enable robust evaluation. Our experiments demonstrate significant improvements in grounding accuracy, showcasing the practical utility of GUI agents in dynamic, real-world environments. Our code and data will be publicly available at Github.

Problem

Research questions and friction points this paper is trying to address.

Evaluating GUI agent transferability across version updates

Assessing GUI agent generalization across diverse platforms

Measuring GUI agent performance in multi-application tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

TransBench benchmark for GUI agent transferability

Evaluates cross-version, platform, application adaptability

Improves grounding accuracy in dynamic environments

🔎 Similar Papers

AgentStudio: A Toolkit for Building General Virtual Agents

2024-03-26arXiv.orgCitations: 8

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Authors to Follow