SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge that existing GUI agents struggle to replicate human-like swiping behaviors, which has become a critical bottleneck in task execution. To overcome this limitation, we propose SwipeGen, a novel framework that, for the first time, decomposes human swiping gestures into multiple quantifiable dimensions and constructs a GUI exploration-driven synthetic data pipeline. By fine-tuning vision-language models with this synthetic data, our approach significantly enhances agents’ swiping capabilities. Our contributions include the first benchmark specifically designed for evaluating swiping performance and a new paradigm for augmenting interactive skills through synthetic data. The resulting agent, GUISwiper, achieves a swiping accuracy of 69.07%, representing a 214% improvement over current vision-language model baselines.

Technology Category

Application Category

📝 Abstract

With the widespread adoption of Graphical User Interface (GUI) agents for automating GUI interaction tasks, substantial research focused on improving GUI perception to ground task instructions into concrete action steps. However, the step execution capability of these agents has gradually emerged as a new bottleneck for task completion. In particular, existing GUI agents often adopt overly simplified strategies for handling swipe interactions, preventing them from accurately replicating human-like behavior. To address this limitation, we decompose human swipe gestures into multiple quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions through GUI exploration. Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents. Furthermore, leveraging the synthesized data, we propose GUISwiper, a GUI agent with enhanced interaction execution capabilities. Experimental results demonstrate that GUISwiper achieves a swipe execution accuracy of 69.07%, representing a 214% improvement over existing VLM baselines.

Problem

Research questions and friction points this paper is trying to address.

GUI agents

swipe interaction

execution gap

human-like behavior

task automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Swipe Synthesis

GUI Agents

Human-like Interaction