UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches to mobile UI testing and AI agent training face challenges in modeling dynamic environments, as they rely either on physical devices or static screenshots—limiting scalability and realism. To address this, we propose an image-based interactive UI simulator that employs a two-stage paradigm: first predicting the structured layout of the next UI state, then synthesizing a visually consistent screen image conditioned on that layout. This enables high-fidelity, temporally coherent UI transition simulation. The system integrates UI layout prediction with state-of-the-art diffusion-based image generation, supporting end-to-end modeling and rendering of UI state sequences. Experiments demonstrate significant improvements over end-to-end baselines in visual authenticity, state continuity, and interaction plausibility. Our approach provides a scalable, lightweight simulation infrastructure for UI automation testing, rapid prototyping, and embodied AI agent training.

Technology Category

Application Category

📝 Abstract
Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or limited static analysis of screenshots, which hinders scalable testing and the development of intelligent UI agents. We introduce UISim, a novel image-based UI simulator that offers a dynamic and interactive platform for exploring mobile phone environments purely from screen images. Our system employs a two-stage method: given an initial phone screen image and a user action, it first predicts the abstract layout of the next UI state, then synthesizes a new, visually consistent image based on this predicted layout. This approach enables the realistic simulation of UI transitions. UISim provides immediate practical benefits for UI testing, rapid prototyping, and synthetic data generation. Furthermore, its interactive capabilities pave the way for advanced applications, such as UI navigation task planning for AI agents. Our experimental results show that UISim outperforms end-to-end UI generation baselines in generating realistic and coherent subsequent UI states, highlighting its fidelity and potential to streamline UI development and enhance AI agent training.
Problem

Research questions and friction points this paper is trying to address.

Simulating dynamic mobile UI interactions from screen images
Overcoming limitations of physical devices and static screenshot analysis
Generating realistic UI transitions for testing and AI training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates UI transitions from screen images
Predicts abstract layout for next UI state
Synthesizes visually consistent new UI images
🔎 Similar Papers
No similar papers found.
Jiannan Xiang
Jiannan Xiang
University of California, San Diego
Natural Language Processing
Y
Yun Zhu
Google DeepMind
L
Lei Shu
Google DeepMind
M
Maria Wang
Google DeepMind
Lijun Yu
Lijun Yu
Google DeepMind
Video GenerationMultimodal Foundation Model
G
Gabriel Barcik
Google DeepMind
J
James Lyon
Google DeepMind
Srinivas Sunkara
Srinivas Sunkara
Google Deepmind
Jindong Chen
Jindong Chen
Fudan University
NLP&RS&KG