Combining Language and App UI Analysis for the Automated Assessment of Bug Reproduction Steps

📅 2025-02-06

📈 Citations: 1

✨ Influential: 0

career value

151K/year

🤖 AI Summary

This paper addresses the challenge of low reproducibility in software bug reports caused by ambiguous or incomplete reproduction steps (S2Rs). To tackle this, we propose a cross-modal quality assessment method that synergistically integrates large language model (LLM)-based semantic understanding with dynamic UI analysis. Our approach constructs a program state model and enables fine-grained semantic alignment between S2R text descriptions and corresponding GUI interaction actions, thereby bridging the lexical diversity and programmatic semantic gap inherent in prior methods. We introduce the first LLM-driven framework that deeply unifies natural language parsing with GUI state modeling. Evaluated on standard benchmarks, our method achieves a 25.2% improvement in F1-score for S2R quality labeling and a 71.4% gain in F1-score for missing-step completion over state-of-the-art approaches—significantly enhancing both reproducibility and debugging efficiency.

Technology Category

Application Category

📝 Abstract

Bug reports are essential for developers to confirm software problems, investigate their causes, and validate fixes. Unfortunately, reports often miss important information or are written unclearly, which can cause delays, increased issue resolution effort, or even the inability to solve issues. One of the most common components of reports that are problematic is the steps to reproduce the bug(s) (S2Rs), which are essential to replicate the described program failures and reason about fixes. Given the proclivity for deficiencies in reported S2Rs, prior work has proposed techniques that assist reporters in writing or assessing the quality of S2Rs. However, automated understanding of S2Rs is challenging, and requires linking nuanced natural language phrases with specific, semantically related program information. Prior techniques often struggle to form such language to program connections - due to issues in language variability and limitations of information gleaned from program analyses. To more effectively tackle the problem of S2R quality annotation, we propose a new technique called AstroBR, which leverages the language understanding capabilities of LLMs to identify and extract the S2Rs from bug reports and map them to GUI interactions in a program state model derived via dynamic analysis. We compared AstroBR to a related state-of-the-art approach and we found that AstroBR annotates S2Rs 25.2% better (in terms of F1 score) than the baseline. Additionally, AstroBR suggests more accurate missing S2Rs than the baseline (by 71.4% in terms of F1 score).

Problem

Research questions and friction points this paper is trying to address.

Automated assessment of bug reproduction steps.

Linking natural language to GUI interactions.

Improving S2R quality with LLMs and dynamic analysis.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for S2R extraction

Dynamic analysis for GUI mapping

Improved S2R annotation accuracy

🔎 Similar Papers

No similar papers found.