PruneCD: Contrasting Pruned Self Model to Improve Decoding Factuality

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address hallucination in large language models (LLMs), this paper proposes a layer-pruning-based contrastive decoding method. Instead of relying on conventional early-exit mechanisms, it constructs a lightweight, domain-agnostic “contrastive model” by pruning the top layers of the Transformer architecture; this pruned model runs in parallel with the full-parameter model during inference, and their logits are dynamically weighted and fused. Leveraging the smoother, fact-distribution-aligned outputs of the pruned model, the approach strengthens discriminative contrastive signals. Experiments demonstrate substantial improvements in factual accuracy—e.g., +3.2–7.8 points on FactScore and FEVER—while incurring only ~12% additional latency, ensuring practical inference overhead. The core contribution is the first use of structured layer pruning to instantiate a contrastive model, enabling efficient, plug-and-play factual consistency enhancement.

Technology Category

Application Category

📝 Abstract

To mitigate the hallucination problem in large language models, DoLa exploits early exit logits from the same model as a contrastive prior. However, we found that these early exit logits tend to be flat, low in magnitude, and fail to reflect meaningful contrasts. To address this, we propose PruneCD, a novel contrastive decoding method that constructs the amateur model via layer pruning rather than early exit. This design leads to more informative and well-aligned logits, enabling more effective contrastive decoding. Through qualitative and quantitative analyses, we demonstrate that PruneCD consistently improves factuality with minimal inference overhead, offering a robust and practical approach to mitigating hallucinations in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Mitigating hallucination problems in large language models

Improving decoding factuality through contrastive methods

Addressing flat early exit logits in existing approaches

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses layer pruning for amateur model construction

Creates informative aligned logits for contrast

Improves factuality with minimal inference overhead

🔎 Similar Papers

No similar papers found.