🤖 AI Summary
This work addresses the challenge of repairing visual defects in Android applications, which is often hindered by incomplete human-submitted bug reports lacking observed behavior (OB), expected behavior (EB), or steps to reproduce (S2Rs). To overcome this limitation, the authors propose the first approach that leverages GUI context to guide large language models (LLMs) in generating structured bug reports. By integrating interaction logs and screenshots, the method automatically produces comprehensive and accurate OB, EB, and S2Rs. A unified evaluation framework assessing both correctness and completeness is also introduced. Experimental results on 48 bug reports across 26 applications demonstrate that the generated reports significantly outperform both original human-written reports and existing LLM-based baselines in quality.
📝 Abstract
Most defects in mobile applications are visually observable on the device screen. To track these defects, users, testers, and developers must manually submit bug reports, especially in the absence of crashes. However, these reports are frequently ambiguous or inaccurate, often omitting essential components such as the Observed Behavior (OB), Expected Behavior (EB), or Steps to Reproduce (S2Rs). Low-quality reports hinder developers' ability to understand and reproduce defects, delaying resolution and leading to incorrect or unresolvable fixes.
In this paper, we posit that providing specific app-related information (e.g., GUI interactions or specific screens where bugs appear) to LLMs as key points of context can assist in automatically generating clear, detailed, and accurate OB, EB, and S2Rs. We built and evaluated a novel approach, BugScribe, that generates bug reports in this way. To support the evaluation, we introduce a unified quality framework that defines correctness and completeness dimensions for OB, EB, and S2Rs. Using 48 bug reports from 26 Android apps, we show that BugScribe produces higher-quality and more accurate components than the original reports and outperforms recent LLM-based baselines. We envision that BugScribe can serve as a practical assistant for testers and developers by enhancing incomplete bug reports with reliable and accurate OB, EB, and S2Rs, thereby streamlining bug resolution and improving mobile app quality.