STAB: Specification-driven Testing for Algorithmic Bottlenecks

📅 2026-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing approaches struggle to generate test cases that expose the worst-case time complexity of algorithms using only natural language problem descriptions. This work proposes STAB, the first framework capable of systematically generating bottleneck-inducing test cases without requiring a reference implementation, relying solely on problem specifications. STAB employs a two-stage mechanism: it first achieves constraint saturation through rule-driven reasoning and CP-SAT solving, then injects adversarial structures via keyword matching and k-nearest neighbor retrieval to guide large language models in producing structured test code. Evaluated on the CodeContests benchmark, STAB significantly improves the proportion of valid bottleneck tests generated by both open- and closed-source large language models, achieving 73.45% and 71.85% effectiveness, respectively, across Python, Java, and C++.
📝 Abstract
Evaluating the efficiency of algorithmic code requires test cases that expose runtime bottlenecks. Previous methods generate efficiency test cases either by increasing input size or by generating code-specific inputs that make the given implementation run slowly. Consequently, they do not address the structural input conditions that drive the algorithmic worst case. We introduce STAB, a specification-driven pipeline that generates test cases that expose algorithmic bottlenecks from a natural-language problem specification alone. STAB separates the task into constraint-bound maximization and adversarial structure injection. (i) The constraint saturator extracts constraints and resolves large admissible size assignments using rule-based saturation and CP-SAT optimization over related variables. (ii) The adversarial scenario injector retrieves implementation-level adversarial construction principles from a curated scenario catalog using keyword matching and K-nearest neighbors (KNN). STAB encodes the problem specification, resolved boundary, and retrieved construction principles into a structured generation specification, from which the LLM synthesizes a Python test case generator. On CodeContests, STAB raises the rate of generated test cases that expose algorithmic bottlenecks from 50.43% to 73.45% on average across open-source LLMs and from 57.45% to 71.85% on average across closed-source LLMs, with consistent gains across Python, Java, and C++. Our code is available at https://github.com/suhanmen/STAB.
Problem

Research questions and friction points this paper is trying to address.

algorithmic bottlenecks
efficiency testing
worst-case inputs
test case generation
specification-driven
Innovation

Methods, ideas, or system contributions that make the work stand out.

specification-driven testing
algorithmic bottlenecks
constraint saturation
adversarial scenario injection
LLM-guided test generation