🤖 AI Summary
This work addresses the challenge of implicit safety degradation in iterative code optimization by large language models (LLMs), which often arises from conflicting objectives and evades conventional static analysis. To mitigate this, the authors propose SCAFFOLD-CEGIS, a novel framework that integrates counterexample-guided inductive synthesis (CEGIS) into LLM-based code optimization for the first time. By leveraging multi-agent collaboration, the approach explicitly encodes safety constraints as verifiable hard constraints and enforces safety monotonicity throughout optimization via semantic anchoring and a four-tier gating mechanism. This ensures automatic identification and preservation of safety-critical elements. Experimental results demonstrate that SCAFFOLD-CEGIS reduces implicit safety degradation to 2.1%, substantially outperforming six existing defense strategies, while achieving 100% safety monotonicity.
📝 Abstract
The application of large language models to code generation has evolved from one-shot generation to iterative refinement, yet the evolution of security throughout iteration remains insufficiently understood. Through comparative experiments on three mainstream LLMs, this paper reveals the iterative refinement paradox: specification drift during multi-objective optimization causes security to degrade gradually over successive iterations. Taking GPT-4o as an example, 43.7 % of iteration chains contain more vulnerabilities than the baseline after ten rounds, and cross-model experiments show that this phenomenon is prevalent. Further analysis shows that simply introducing static application security testing (SAST) gating cannot effectively suppress degradation; instead, it increases the latent security degradation rate from 12.5% under the unprotected baseline to 20.8 %. The root cause is that static-analysis rules cannot cover structural degradations such as the removal of defensive logic or the weakening of exception handling. To address this problem, we propose the SCAFFOLD-CEGIS framework. Drawing on the counterexample-guided inductive synthesis (CEGIS) paradigm, the framework adopts a multi-agent collaborative architecture that transforms security constraints from implicit prompts into explicit verifiable constraints. It automatically identifies and solidifies security-critical elements as hard constraints through semantic anchoring, enforces safety monotonicity through four-layer gated verification, and continuously assimilates experience from failures. Comparative experiments against six existing defense methods show that the full framework reduces the latent security degradation rate to 2.1% and achieves a safety monotonicity rate of 100%.