Understanding on the Edge: LLM-generated Boundary Test Explanations

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This study addresses the challenge testers face in comprehending the behavioral boundaries represented by input-output pairs in boundary value testing. It presents the first empirical evaluation of the effectiveness of natural language explanations generated by a large language model (GPT-4.1) for such test cases. Through Likert-scale ratings and semi-structured interviews with 27 software professionals, the authors conduct a mixed-methods analysis across four dimensions: clarity, correctness, completeness, and usefulness. Results indicate that 63.5% of ratings were positive (4–5 on a 5-point scale), supporting the overall acceptability of LLM-generated explanations. Based on these findings, the paper proposes an actionable set of seven design guidelines emphasizing structured expression, authoritative referencing, and deep user adaptation, offering critical direction for developing LLM-based explanation tools tailored to software testing scenarios.

Technology Category

Application Category

📝 Abstract

Boundary value analysis and testing (BVT) is fundamental in software quality assurance because faults tend to cluster at input extremes, yet testers often struggle to understand and justify why certain input-output pairs represent meaningful behavioral boundaries. Large Language Models (LLMs) could help by producing natural-language rationales, but their value for BVT has not been empirically assessed. We therefore conducted an exploratory study on LLM-generated boundary explanations: in a survey, twenty-seven software professionals rated GPT-4.1 explanations for twenty boundary pairs on clarity, correctness, completeness and perceived usefulness, and six of them elaborated in follow-up interviews. Overall, 63.5% of all ratings were positive (4-5 on a five-point Likert scale) compared to 17% negative (1-2), indicating general agreement but also variability in perceptions. Participants favored explanations that followed a clear structure, cited authoritative sources, and adapted their depth to the reader's expertise; they also stressed the need for actionable examples to support debugging and documentation. From these insights, we distilled a seven-item requirement checklist that defines concrete design criteria for future LLM-based boundary explanation tools. The results suggest that, with further refinement, LLM-based tools can support testing workflows by making boundary explanations more actionable and trustworthy.

Problem

Research questions and friction points this paper is trying to address.

boundary value testing

software testing

test explanation

LLM-generated rationale

software quality assurance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Boundary Value Testing

Test Explanation