🤖 AI Summary
Traditional boundary value analysis (BVA) relies on manual heuristics, while existing automated boundary value exploration (BVE) approaches predominantly employ single-objective optimization, resulting in limited coverage and insufficient behavioral diversity. This paper proposes the first quality-diversity (QD)–based black-box BVE framework, adapting MAP-Elites to boundary testing. It jointly models input- and output-behavior descriptors, integrates dynamic archive maintenance with black-box local search, and enables simultaneous discovery and refinement of diverse, high-quality boundary pairs. Evaluated on ten integer functions, our method increases boundary archive coverage by 37–82 percentage points; it sustains optimization for 600 seconds, significantly outperforming baseline methods constrained to 30-second convergence. Even under minimal configuration, it efficiently identifies diverse edge-case behaviors.
📝 Abstract
Software systems exhibit distinct behaviors based on input characteristics, and failures often occur at the boundaries between input domains. Traditional Boundary Value Analysis (BVA) relies on manual heuristics, while automated Boundary Value Exploration (BVE) methods typically optimize a single quality metric, risking a narrow and incomplete survey of boundary behaviors. We introduce SETBVE, a customizable, modular framework for automated black-box BVE that leverages Quality-Diversity (QD) optimization to systematically uncover and refine a broader spectrum of boundaries. SETBVE maintains an archive of boundary pairs organized by input- and output-based behavioral descriptors. It steers exploration toward underrepresented regions while preserving high-quality boundary pairs and applies local search to refine candidate boundaries. In experiments with ten integer-based functions, SETBVE outperforms the baseline in diversity, boosting archive coverage by 37 to 82 percentage points. A qualitative analysis reveals that SETBVE identifies boundary candidates the baseline misses. While the baseline method typically plateaus in both diversity and quality after 30 seconds, SETBVE continues to improve in 600-second runs, demonstrating better scalability. Even the simplest SETBVE configurations perform well in identifying diverse boundary behaviors. Our findings indicate that balancing quality with behavioral diversity can help identify more software edge-case behaviors than quality-focused approaches.