🤖 AI Summary
To address the challenge of generating goal-directed test inputs for complex software, this paper proposes FdLoop, a feedback-driven probabilistic grammar-based iterative optimization method. Its core innovation lies in the first dynamic coupling of test feedback—such as code coverage, execution time, and exception triggering—with probabilistic context-free grammars (PCFGs), enabling adaptive evolution of input distributions via grammar mutation, input mutation, and online PCFG model updates. FdLoop supports multiple input formats—including JSON, CSS, and JavaScript—and precisely targets diverse testing objectives, such as exception triggering, high-complexity path coverage, and prolonged execution time. Experimental evaluation across 20 open-source projects demonstrates that FdLoop outperforms existing approaches in 86% of scenarios; it achieves twice the exception-triggering capability of the strongest baseline, EvoGFuzz, while maintaining strong extensibility to multi-objective testing and good scalability.
📝 Abstract
To effectively test complex software, it is important to generate goal-specific inputs, i.e., inputs that achieve a specific testing goal. However, most state-of-the-art test generators are not designed to target specific goals. Notably, grammar-based test generators, which (randomly) produce syntactically valid inputs via an input specification (i.e., grammar) have a low probability of achieving an arbitrary testing goal. This work addresses this challenge by proposing an automated test generation approach (called FdLoop) which iteratively learns relevant input properties from existing inputs to drive the generation of goal-specific inputs. Given a testing goal, FdLoop iteratively selects, evolves and learn the input distribution of goal-specific test inputs via test feedback and a probabilistic grammar. We concretize FdLoop for four testing goals, namely unique code coverage, input-to-code complexity, program failures (exceptions) and long execution time. We evaluate FdLoop using three (3) well-known input formats (JSON, CSS and JavaScript) and 20 open-source software. In most (86%) settings, FdLoop outperforms all five tested baselines namely the baseline grammar-based test generators (random, probabilistic and inverse-probabilistic methods), EvoGFuzz and DynaMosa. FdLoop is (up to) twice (2X) as effective as the best baseline (EvoGFuzz) in inducing erroneous behaviors. In addition, we show that the main components of FdLoop (i.e., input mutator, grammar mutator and test feedbacks) contribute positively to its effectiveness. Finally, our evaluation demonstrates that FdLoop effectively achieves single testing goals (revealing erroneous behaviors, generating complex inputs, or inducing long execution time) and scales to multiple testing goals across varying parameter settings.