Structured Safety Auditing for Balancing Code Correctness and Content Safety in LLM-Generated Code

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study addresses the critical gap in evaluating large language models (LLMs) for code generation, where existing benchmarks often overlook the risk of propagating harmful content implicitly embedded in prompts and lack a unified framework balancing functional utility with content safety. The work proposes Dual Reasoning, a structured inference framework grounded in dual-channel constraint theory that treats code as a medium integrating algorithmic logic and natural language. By incorporating explicit safety auditing and task-oriented code review prior to generation, Dual Reasoning introduces the SUDS multidimensional scoring system to jointly optimize safety, functional efficacy, and risk awareness. Experiments across five mainstream LLMs and two harmful keyword-injection benchmarks demonstrate that Dual Reasoning improves average SUDS scores by 1.32 to 3.42 times, significantly outperforming baseline approaches such as chain-of-thought reasoning and safety-tuned prompting.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) for code generation are typically evaluated on functional correctness alone, overlooking whether generated code propagates harmful content embedded in the prompt. Prior work has shown that most Code LLMs reproduce offensive identifiers from injected renaming instructions without warning, yet existing approaches focus on detecting harmful content, neglecting functional correctness. Grounded in the Theory of Dual Channel Constraints (which states that code is a dual-channel medium combining an algorithmic (AL) channel for machine execution and a natural language (NL) channel for human communication, creating a unique safety-utility trade-off where a model must balance functional execution with responsible communication), we propose NLSafety-Utility Duality Score (SUDS), a metric that unifies code utility, safety adherence, and warning awareness into a single score across 12 ranked response scenarios, and Dual Reasoning (DR), a structured inference-time technique that requires an explicit safety audit and task-grounded code review before code generation. Evaluated on five LLMs across two benchmarks augmented with harmful keyword injections (820 and 2,135 samples), DR consistently achieves the highest SUDS across all models, improving mean SUDS by 1.32$\times$ to 3.42$\times$ over the baseline, while chain-of-thought prompting yields negligible safety gains and a safety-aware prompt provides only partial improvement. Further analysis reveals that DR's effectiveness scales with model capacity, that the one-shot exemplar primarily stabilizes output format for smaller models, and that structured reasoning cannot compensate for models with limited safety vocabularies.

Problem

Research questions and friction points this paper is trying to address.

code safety

functional correctness

dual-channel constraints

harmful content propagation

LLM-generated code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual Reasoning

SUDS

code safety