DSCD: Large Language Model Detoxification with Self-Constrained Decoding

📅 2025-10-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face significant challenges in detoxification, including high computational overhead and degraded text fluency when relying on external constraints. To address this, we propose a parameter-free self-constraining decoding method that dynamically modulates token-level probability distributions during generation—enhancing safety-relevant attention layers while suppressing toxic ones—based on multi-layer attention analysis. This lightweight, plug-and-play approach requires no fine-tuning or external modules, ensuring broad compatibility and seamless integration with existing detoxification techniques. Extensive experiments across multiple open-source LLMs and standard benchmark datasets demonstrate that our method achieves state-of-the-art performance in three critical dimensions: detoxification efficacy, generation fluency, and inference efficiency—substantially outperforming prevailing decoding-based detoxification approaches.

Technology Category

Application Category

📝 Abstract
Detoxification in large language models (LLMs) remains a significant research challenge. Existing decoding detoxification methods are all based on external constraints, which require additional resource overhead and lose generation fluency. This work proposes Detoxification with Self-Constrained Decoding (DSCD), a novel method for LLM detoxification without parameter fine-tuning. DSCD strengthens the inner next-token distribution of the safety layer while weakening that of hallucination and toxic layers during output generation. This effectively diminishes toxicity and enhances output safety. DSCD offers lightweight, high compatibility, and plug-and-play capabilities, readily integrating with existing detoxification methods for further performance improvement. Extensive experiments on representative open-source LLMs and public datasets validate DSCD's effectiveness, demonstrating state-of-the-art (SOTA) performance in both detoxification and generation fluency, with superior efficiency compared to existing methods. These results highlight DSCD's potential as a practical and scalable solution for safer LLM deployments.
Problem

Research questions and friction points this paper is trying to address.

Reducing toxicity in large language models without fine-tuning parameters
Maintaining generation fluency while enhancing output safety in LLMs
Providing lightweight plug-and-play detoxification with high compatibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-constrained decoding without parameter fine-tuning
Strengthens safety layer while weakening toxic layers
Lightweight plug-and-play integration with existing methods
🔎 Similar Papers
No similar papers found.
M
Ming Dong
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, National Language Resources Monitoring and Research Center for Network Media, Central China Normal University
J
Jinkui Zhang
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, National Language Resources Monitoring and Research Center for Network Media, Central China Normal University
Bolong Zheng
Bolong Zheng
Huazhong University of Science and Technology
Database
X
Xinhui Tu
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, National Language Resources Monitoring and Research Center for Network Media, Central China Normal University
P
Po Hu
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, National Language Resources Monitoring and Research Center for Network Media, Central China Normal University
T
Tingting He
Hubei Provincial Key Laboratory of Artificial Intelligence and Smart Learning, National Language Resources Monitoring and Research Center for Network Media, Central China Normal University