UISim: An Interactive Image-Based UI Simulator for Dynamic Mobile Environments

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

Existing approaches to mobile UI testing and AI agent training face challenges in modeling dynamic environments, as they rely either on physical devices or static screenshots—limiting scalability and realism. To address this, we propose an image-based interactive UI simulator that employs a two-stage paradigm: first predicting the structured layout of the next UI state, then synthesizing a visually consistent screen image conditioned on that layout. This enables high-fidelity, temporally coherent UI transition simulation. The system integrates UI layout prediction with state-of-the-art diffusion-based image generation, supporting end-to-end modeling and rendering of UI state sequences. Experiments demonstrate significant improvements over end-to-end baselines in visual authenticity, state continuity, and interaction plausibility. Our approach provides a scalable, lightweight simulation infrastructure for UI automation testing, rapid prototyping, and embodied AI agent training.

Technology Category

Application Category

📝 Abstract

Developing and testing user interfaces (UIs) and training AI agents to interact with them are challenging due to the dynamic and diverse nature of real-world mobile environments. Existing methods often rely on cumbersome physical devices or limited static analysis of screenshots, which hinders scalable testing and the development of intelligent UI agents. We introduce UISim, a novel image-based UI simulator that offers a dynamic and interactive platform for exploring mobile phone environments purely from screen images. Our system employs a two-stage method: given an initial phone screen image and a user action, it first predicts the abstract layout of the next UI state, then synthesizes a new, visually consistent image based on this predicted layout. This approach enables the realistic simulation of UI transitions. UISim provides immediate practical benefits for UI testing, rapid prototyping, and synthetic data generation. Furthermore, its interactive capabilities pave the way for advanced applications, such as UI navigation task planning for AI agents. Our experimental results show that UISim outperforms end-to-end UI generation baselines in generating realistic and coherent subsequent UI states, highlighting its fidelity and potential to streamline UI development and enhance AI agent training.

Problem

Research questions and friction points this paper is trying to address.

Simulating dynamic mobile UI interactions from screen images

Overcoming limitations of physical devices and static screenshot analysis

Generating realistic UI transitions for testing and AI training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates UI transitions from screen images

Predicts abstract layout for next UI state

Synthesizes visually consistent new UI images

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)