🤖 AI Summary
This work addresses the challenge of automatically reproducing software defects from GUI screen recordings, a task where existing approaches often rely on brittle image heuristics, explicit touch annotations, or complex pre-constructed UI state graphs. The authors propose a lightweight, fully automated method that requires neither application instrumentation nor prior knowledge of the UI structure, operating solely on raw screen recordings to achieve high-fidelity defect replay. Their approach leverages CLIP embeddings for precise action boundary segmentation and integrates a vision-language model to enable region-aware GUI state comparison and guided reproduction. Experimental results demonstrate that the method successfully replays 72% of recorded defects, significantly outperforming current baselines and ablated variants, thereby eliminating the need for auxiliary metadata or customized recording setups.
📝 Abstract
Bug reports play a critical role in software maintenance by helping users convey encountered issues to developers. Recently, GUI screen capture videos have gained popularity as a bug reporting artifact due to their ease of use and ability to retain rich contextual information. However, automatically reproducing bugs from such recordings remains a significant challenge. Existing methods often rely on fragile image-processing heuristics, explicit touch indicators, or pre-constructed UI transition graphs, which require non-trivial instrumentation and app-specific setup. This paper presents ViBR, a lightweight and fully automated approach that reproduces bugs directly from GUI recordings. Specifically, ViBR combines CLIP-based embedding similarity for action boundary segmentation with Vision-Language Models (VLMs) for region-aware GUI state comparison and guided bug replay. Experimental results show that ViBR successfully reproduces 72% of bug recordings, significantly outperforming state-of-the-art baselines and ablation variants.