π€ AI Summary
Large language models (LLMs) for code generation often reproduce unsafe patterns present in their training data, and existing approaches that fine-tune only the final layer are constrained by a βlast-layer bottleneck,β limiting their ability to capture vulnerability-discriminative signals distributed across multiple layers. Through layer-wise linear probing, this study reveals that vulnerability-related semantic signals concentrate in upper-middle layers and decay with depth. Building on this insight, the authors propose DeepGuard, a framework that leverages attention mechanisms to aggregate high-level representations from multiple layers and integrates multi-objective joint training with lightweight inference-time guidance. Evaluated across five prominent code LLMs, DeepGuard improves the generation rate of both secure and functionally correct code by 11.9% on average, significantly outperforming strong baselines such as SVEN, generalizing to unseen vulnerability types, and preserving functional correctness.
π Abstract
Large Language Models (LLMs) for code generation can replicate insecure patterns from their training data. To mitigate this, a common strategy for security hardening is to fine-tune models using supervision derived from the final transformer layer. However, this design may suffer from a final-layer bottleneck: vulnerability-discriminative cues can be distributed across layers and become less detectable near the output representations optimized for next-token prediction. To diagnose this issue, we perform layer-wise linear probing. We observe that vulnerability-related signals are most detectable in a band of intermediate-to-upper layers yet attenuate toward the final layers. Motivated by this observation, we introduce DeepGuard, a framework that leverages distributed security-relevant cues by aggregating representations from multiple upper layers via an attention-based module. The aggregated signal powers a dedicated security analyzer within a multi-objective training objective that balances security enhancement and functional correctness, and further supports a lightweight inference-time steering strategy. Extensive experiments across five code LLMs demonstrate that DeepGuard improves the secure-and-correct generation rate by an average of 11.9% over strong baselines such as SVEN. It also preserves functional correctness while exhibiting generalization to held-out vulnerability types. Our code is public at https://github.com/unknownhl/DeepGuard.