🤖 AI Summary
This study addresses the challenge of early validation for enterprise intelligent document processing (IDP) systems under limited evaluation budgets, where the goal is to uncover diverse failure mechanisms rather than merely worst-case scenarios. The problem is formulated as a search-based testing task within a combinatorial structural space, aiming to maximize the identification of distinct structural risk characteristics. We present the first systematic comparison of multiple search strategies—including evolutionary algorithms, swarm intelligence, quality-diversity optimization, learning-driven methods, and quantum-inspired heuristics—demonstrating their complementary strengths in risk discovery. No single approach dominates; instead, combining strategies significantly enhances validation robustness. Empirical results show that individual solvers consistently uncover failure modes missed by others, and only their joint use achieves full coverage of the known failure space, whereas relying on any single method risks systematically delaying the exposure of critical vulnerabilities.
📝 Abstract
Enterprise-grade Intelligent Document Processing (IDP) systems support high-stakes workflows across finance, insurance, and healthcare. Early-phase system validation under limited budgets mandates uncovering diverse failure mechanisms, rather than identifying a single worst-case document. We formalize this challenge as a Search-Based Software Testing (SBST) problem, aiming to identify complex interactions between document variables, with the objective to maximize the number of distinct failure types discovered within a fixed evaluation budget. Our methodology operates on a combinatorial space of document configurations, rendering instances of structural \emph{risk features} to induce realistic failure conditions. We benchmark a diverse portfolio of search strategies spanning evolutionary, swarm-based, quality-diversity, learning-based, and quantum under identical budget constraints. Through configuration-level exclusivity, win-rate, and cross-temporal overlap analyses, we show that different solvers consistently uncover failure modes that remain undiscovered by specific alternatives at comparable budgets. Crucially, cross-temporal analysis reveals persistent solver-specific discoveries across all evaluated budgets, with no single strategy exhibiting absolute dominance. While the union of all solvers eventually recovers the observed failure space, reliance on any individual method systematically delays the discovery of important risks. These results demonstrate intrinsic solver complementarity and motivate portfolio-based SBST strategies for robust industrial IDP validation.