RESCUE: Retrieval Augmented Secure Code Generation

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Existing LLMs still generate insecure code, while conventional RAG struggles with noisy security documentation and fails to capture implicit security semantics in task descriptions. To address these challenges, this paper proposes SecurePass, a hierarchical RAG framework tailored for secure code generation. Its core contributions are: (1) a hybrid security knowledge base integrating program slicing with LLM-driven clustering and summary distillation to reduce noise in raw security documents; and (2) a hierarchical, multi-dimensional retrieval mechanism that progressively fuses semantic, structural, and contextual security signals to explicitly uncover latent security requirements in task specifications. Extensive experiments across four benchmarks and six mainstream LLMs demonstrate that SecurePass achieves an average +4.8-point improvement in SecurePass@1 over baseline methods, significantly outperforming five state-of-the-art approaches and establishing new SOTA performance.

Technology Category

Application Category

📝 Abstract
Despite recent advances, Large Language Models (LLMs) still generate vulnerable code. Retrieval-Augmented Generation (RAG) has the potential to enhance LLMs for secure code generation by incorporating external security knowledge. However, the conventional RAG design struggles with the noise of raw security-related documents, and existing retrieval methods overlook the significant security semantics implicitly embedded in task descriptions. To address these issues, we propose RESCUE, a new RAG framework for secure code generation with two key innovations. First, we propose a hybrid knowledge base construction method that combines LLM-assisted cluster-then-summarize distillation with program slicing, producing both high-level security guidelines and concise, security-focused code examples. Second, we design a hierarchical multi-faceted retrieval to traverse the constructed knowledge base from top to bottom and integrates multiple security-critical facts at each hierarchical level, ensuring comprehensive and accurate retrieval. We evaluated RESCUE on four benchmarks and compared it with five state-of-the-art secure code generation methods on six LLMs. The results demonstrate that RESCUE improves the SecurePass@1 metric by an average of 4.8 points, establishing a new state-of-the-art performance for security. Furthermore, we performed in-depth analysis and ablation studies to rigorously validate the effectiveness of individual components in RESCUE.
Problem

Research questions and friction points this paper is trying to address.

LLMs generate vulnerable code requiring security enhancement
Conventional RAG struggles with noisy security document retrieval
Existing methods overlook security semantics in task descriptions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid knowledge base combines distilled guidelines and code examples
Hierarchical multifaceted retrieval integrates multiple security-critical facts
LLM-assisted cluster-then-summarize distillation with program slicing