Synthesizing Performance Constraints for Evaluating and Improving Code Efficiency

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing performance testing relies heavily on manually crafted or LLM-generated simplistic workloads, which often fail to expose fine-grained performance bottlenecks. To address this, we propose WEDGE, the first framework that employs branch-level performance characterization constraints to partition the program’s execution space, enabling targeted exploration of performance-sensitive regions via coverage-guided fuzzing. Our methodology integrates static program analysis, constraint solving, and LLM-assisted test generation to construct PERFFORGE—a highly discriminative, performance-oriented test suite. Experiments demonstrate that PERFFORGE significantly amplifies observable performance disparities across compiler optimization levels, thereby substantially enhancing the effectiveness of test-driven code optimization. The associated benchmark dataset is publicly released.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have been increasingly used to optimize code efficiency. Evaluating their effectiveness and further suggesting optimization opportunities often rely on high-quality tests to demonstrate the performance bottlenecks presented in the program. However, existing approaches rely on a limited set of hand-curated inputs or LLM-generated uninteresting length-stressing tests, failing to reveal more nuanced optimization opportunities. We present WEDGE, a framework for generating performance-stressing input given the program under test. WEDGE synthesizes explicit performance-characterizing constraints in the form of branch conditions to partition the programs' execution space into performance-specific regions. When integrated with the coverage-guided fuzzer, reaching different regions introduces explicit rewards for test generation to explore inefficient implementations. Our evaluation shows that WEDGE introduces a significant slowdown compared to the tests in CodeContests and those claimed to be optimized by existing approaches. From the utility perspective, integrating our tests substantially improves the existing code optimization approaches that rely on test-driven execution feedback. We release PERFFORGE, the performance tests generated by WEDGE, to benchmark future approaches for efficient code generation at https://github.com/UChiSeclab/perfforge.
Problem

Research questions and friction points this paper is trying to address.

Generating performance-stressing tests for code efficiency evaluation
Identifying nuanced optimization opportunities beyond hand-curated inputs
Improving existing code optimization approaches with test-driven feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

WEDGE generates performance-stressing input for programs
Synthesizes branch conditions to partition execution space
Integrates with fuzzer to explore inefficient implementations
🔎 Similar Papers
No similar papers found.
J
Jun Yang
Department of Computer Science, The University of Chicago
C
Cheng-Chi Wang
Department of Computer Science, The University of Chicago
B
B. A. Stoica
Department of Computer Science, The University of Chicago
Kexin Pei
Kexin Pei
Assistant Professor, Computer Science, University of Chicago
SecuritySoftware EngineeringMachine Learning