STAB: Specification-driven Testing for Algorithmic Bottlenecks

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing approaches struggle to generate test cases that expose the worst-case time complexity of algorithms using only natural language problem descriptions. This work proposes STAB, the first framework capable of systematically generating bottleneck-inducing test cases without requiring a reference implementation, relying solely on problem specifications. STAB employs a two-stage mechanism: it first achieves constraint saturation through rule-driven reasoning and CP-SAT solving, then injects adversarial structures via keyword matching and k-nearest neighbor retrieval to guide large language models in producing structured test code. Evaluated on the CodeContests benchmark, STAB significantly improves the proportion of valid bottleneck tests generated by both open- and closed-source large language models, achieving 73.45% and 71.85% effectiveness, respectively, across Python, Java, and C++.

📝 Abstract

Evaluating the efficiency of algorithmic code requires test cases that expose runtime bottlenecks. Previous methods generate efficiency test cases either by increasing input size or by generating code-specific inputs that make the given implementation run slowly. Consequently, they do not address the structural input conditions that drive the algorithmic worst case. We introduce STAB, a specification-driven pipeline that generates test cases that expose algorithmic bottlenecks from a natural-language problem specification alone. STAB separates the task into constraint-bound maximization and adversarial structure injection. (i) The constraint saturator extracts constraints and resolves large admissible size assignments using rule-based saturation and CP-SAT optimization over related variables. (ii) The adversarial scenario injector retrieves implementation-level adversarial construction principles from a curated scenario catalog using keyword matching and K-nearest neighbors (KNN). STAB encodes the problem specification, resolved boundary, and retrieved construction principles into a structured generation specification, from which the LLM synthesizes a Python test case generator. On CodeContests, STAB raises the rate of generated test cases that expose algorithmic bottlenecks from 50.43% to 73.45% on average across open-source LLMs and from 57.45% to 71.85% on average across closed-source LLMs, with consistent gains across Python, Java, and C++. Our code is available at https://github.com/suhanmen/STAB.

Problem

Research questions and friction points this paper is trying to address.

algorithmic bottlenecks

efficiency testing

worst-case inputs

test case generation

specification-driven

Innovation

Methods, ideas, or system contributions that make the work stand out.

specification-driven testing

algorithmic bottlenecks

constraint saturation