SwipeGen: Bridging the Execution Gap in GUI Agents via Human-like Swipe Synthesis

πŸ“… 2026-01-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge that existing GUI agents struggle to replicate human-like swiping behaviors, which has become a critical bottleneck in task execution. To overcome this limitation, we propose SwipeGen, a novel framework that, for the first time, decomposes human swiping gestures into multiple quantifiable dimensions and constructs a GUI exploration-driven synthetic data pipeline. By fine-tuning vision-language models with this synthetic data, our approach significantly enhances agents’ swiping capabilities. Our contributions include the first benchmark specifically designed for evaluating swiping performance and a new paradigm for augmenting interactive skills through synthetic data. The resulting agent, GUISwiper, achieves a swiping accuracy of 69.07%, representing a 214% improvement over current vision-language model baselines.

Technology Category

Application Category

πŸ“ Abstract
With the widespread adoption of Graphical User Interface (GUI) agents for automating GUI interaction tasks, substantial research focused on improving GUI perception to ground task instructions into concrete action steps. However, the step execution capability of these agents has gradually emerged as a new bottleneck for task completion. In particular, existing GUI agents often adopt overly simplified strategies for handling swipe interactions, preventing them from accurately replicating human-like behavior. To address this limitation, we decompose human swipe gestures into multiple quantifiable dimensions and propose an automated pipeline SwipeGen to synthesize human-like swipe interactions through GUI exploration. Based on this pipeline, we construct and release the first benchmark for evaluating the swipe execution capability of GUI agents. Furthermore, leveraging the synthesized data, we propose GUISwiper, a GUI agent with enhanced interaction execution capabilities. Experimental results demonstrate that GUISwiper achieves a swipe execution accuracy of 69.07%, representing a 214% improvement over existing VLM baselines.
Problem

Research questions and friction points this paper is trying to address.

GUI agents
swipe interaction
execution gap
human-like behavior
task automation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Swipe Synthesis
GUI Agents
Human-like Interaction
Execution Benchmark
Gesture Decomposition
πŸ”Ž Similar Papers
No similar papers found.
X
Xuan Wang
College of Computer Science and Artificial Intelligence, Fudan University
S
Siyuan Su
College of Computer Science and Artificial Intelligence, Fudan University
Q
Quantong Fu
College of Computer Science and Artificial Intelligence, Fudan University
Yongxiang Hu
Yongxiang Hu
NASA Langley Research Center (LaRC)
Radiative transferlidarsnowmultiple scatteringocean lidar
Yangfan Zhou
Yangfan Zhou
Professor, Fudan University
Cloud ComputingSoftware Engineering