DSCD: Large Language Model Detoxification with Self-Constrained Decoding

📅 2025-10-15

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Large language models (LLMs) face significant challenges in detoxification, including high computational overhead and degraded text fluency when relying on external constraints. To address this, we propose a parameter-free self-constraining decoding method that dynamically modulates token-level probability distributions during generation—enhancing safety-relevant attention layers while suppressing toxic ones—based on multi-layer attention analysis. This lightweight, plug-and-play approach requires no fine-tuning or external modules, ensuring broad compatibility and seamless integration with existing detoxification techniques. Extensive experiments across multiple open-source LLMs and standard benchmark datasets demonstrate that our method achieves state-of-the-art performance in three critical dimensions: detoxification efficacy, generation fluency, and inference efficiency—substantially outperforming prevailing decoding-based detoxification approaches.

Technology Category

Application Category

📝 Abstract

Detoxification in large language models (LLMs) remains a significant research challenge. Existing decoding detoxification methods are all based on external constraints, which require additional resource overhead and lose generation fluency. This work proposes Detoxification with Self-Constrained Decoding (DSCD), a novel method for LLM detoxification without parameter fine-tuning. DSCD strengthens the inner next-token distribution of the safety layer while weakening that of hallucination and toxic layers during output generation. This effectively diminishes toxicity and enhances output safety. DSCD offers lightweight, high compatibility, and plug-and-play capabilities, readily integrating with existing detoxification methods for further performance improvement. Extensive experiments on representative open-source LLMs and public datasets validate DSCD's effectiveness, demonstrating state-of-the-art (SOTA) performance in both detoxification and generation fluency, with superior efficiency compared to existing methods. These results highlight DSCD's potential as a practical and scalable solution for safer LLM deployments.

Problem

Research questions and friction points this paper is trying to address.

Reducing toxicity in large language models without fine-tuning parameters

Maintaining generation fluency while enhancing output safety in LLMs

Providing lightweight plug-and-play detoxification with high compatibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-constrained decoding without parameter fine-tuning

Strengthens safety layer while weakening toxic layers

Lightweight plug-and-play integration with existing methods

🔎 Similar Papers

No similar papers found.