🤖 AI Summary
This paper addresses the challenge of low reproducibility in software bug reports caused by ambiguous or incomplete reproduction steps (S2Rs). To tackle this, we propose a cross-modal quality assessment method that synergistically integrates large language model (LLM)-based semantic understanding with dynamic UI analysis. Our approach constructs a program state model and enables fine-grained semantic alignment between S2R text descriptions and corresponding GUI interaction actions, thereby bridging the lexical diversity and programmatic semantic gap inherent in prior methods. We introduce the first LLM-driven framework that deeply unifies natural language parsing with GUI state modeling. Evaluated on standard benchmarks, our method achieves a 25.2% improvement in F1-score for S2R quality labeling and a 71.4% gain in F1-score for missing-step completion over state-of-the-art approaches—significantly enhancing both reproducibility and debugging efficiency.
📝 Abstract
Bug reports are essential for developers to confirm software problems, investigate their causes, and validate fixes. Unfortunately, reports often miss important information or are written unclearly, which can cause delays, increased issue resolution effort, or even the inability to solve issues. One of the most common components of reports that are problematic is the steps to reproduce the bug(s) (S2Rs), which are essential to replicate the described program failures and reason about fixes. Given the proclivity for deficiencies in reported S2Rs, prior work has proposed techniques that assist reporters in writing or assessing the quality of S2Rs. However, automated understanding of S2Rs is challenging, and requires linking nuanced natural language phrases with specific, semantically related program information. Prior techniques often struggle to form such language to program connections - due to issues in language variability and limitations of information gleaned from program analyses. To more effectively tackle the problem of S2R quality annotation, we propose a new technique called AstroBR, which leverages the language understanding capabilities of LLMs to identify and extract the S2Rs from bug reports and map them to GUI interactions in a program state model derived via dynamic analysis. We compared AstroBR to a related state-of-the-art approach and we found that AstroBR annotates S2Rs 25.2% better (in terms of F1 score) than the baseline. Additionally, AstroBR suggests more accurate missing S2Rs than the baseline (by 71.4% in terms of F1 score).