🤖 AI Summary
This study addresses the challenge testers face in comprehending the behavioral boundaries represented by input-output pairs in boundary value testing. It presents the first empirical evaluation of the effectiveness of natural language explanations generated by a large language model (GPT-4.1) for such test cases. Through Likert-scale ratings and semi-structured interviews with 27 software professionals, the authors conduct a mixed-methods analysis across four dimensions: clarity, correctness, completeness, and usefulness. Results indicate that 63.5% of ratings were positive (4–5 on a 5-point scale), supporting the overall acceptability of LLM-generated explanations. Based on these findings, the paper proposes an actionable set of seven design guidelines emphasizing structured expression, authoritative referencing, and deep user adaptation, offering critical direction for developing LLM-based explanation tools tailored to software testing scenarios.
📝 Abstract
Boundary value analysis and testing (BVT) is fundamental in software quality assurance because faults tend to cluster at input extremes, yet testers often struggle to understand and justify why certain input-output pairs represent meaningful behavioral boundaries. Large Language Models (LLMs) could help by producing natural-language rationales, but their value for BVT has not been empirically assessed. We therefore conducted an exploratory study on LLM-generated boundary explanations: in a survey, twenty-seven software professionals rated GPT-4.1 explanations for twenty boundary pairs on clarity, correctness, completeness and perceived usefulness, and six of them elaborated in follow-up interviews. Overall, 63.5% of all ratings were positive (4-5 on a five-point Likert scale) compared to 17% negative (1-2), indicating general agreement but also variability in perceptions. Participants favored explanations that followed a clear structure, cited authoritative sources, and adapted their depth to the reader's expertise; they also stressed the need for actionable examples to support debugging and documentation. From these insights, we distilled a seven-item requirement checklist that defines concrete design criteria for future LLM-based boundary explanation tools. The results suggest that, with further refinement, LLM-based tools can support testing workflows by making boundary explanations more actionable and trustworthy.