Beyond "I cannot fulfill this request": Alleviating Rigid Rejection in LLMs via Label Enhancement

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the rigidity and unnaturalness of template-based refusal responses commonly employed by large language models in safety alignment. To overcome this limitation, the authors propose LANCE, a novel approach that introduces a fine-grained refusal category distribution grounded in variational inference. By leveraging label-augmented textual gradients during generation, LANCE steers the model toward diverse, contextually appropriate refusals while effectively neutralizing harmful content. Experimental results demonstrate that LANCE substantially mitigates the problem of rigid refusals, achieving high safety without compromising response quality—significantly improving both helpfulness and fluency compared to existing baselines.

📝 Abstract

Large Language Models (LLMs) rely on safety alignment to obey safe requests while refusing harmful ones. However, traditional refusal mechanisms often lead to "rigid rejection," where a general template (e.g., "I cannot fulfill this request") indiscriminately triggers refusals and severely undermines the naturalness of interactions between humans and LLMs. To address this issue, LANCE is proposed in this paper to ensure safe yet flexible and natural responses via label enhancement. Specifically, LANCE employs variational inference to perform label enhancement, predicting a continuous distribution across multiple rejection categories. These fine-grained rejection distributions provide multi-way textual gradients for a refinement model to neutralize the hazardous elements in the prompt, so that the LLMs could generate safe responses that avoid rigid rejections while preserving the naturalness of interactions. Experiments demonstrate that LANCE significantly alleviates the rigid rejection problem while maintaining high security standards, significantly outperforming existing baseline models in terms of helpfulness and naturalness of responses.

Problem

Research questions and friction points this paper is trying to address.

rigid rejection

safety alignment

Large Language Models

naturalness

refusal mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

label enhancement

rigid rejection

variational inference