Dual Optimal: Make Your LLM Peer-like with Dignity

📅 2026-04-01

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Current aligned language models often fall into the “avoidant servant” trap, reflexively accommodating users’ erroneous views while evading responsibility through templated disclaimers. This work proposes a “dignified peer” framework that fosters egalitarian and dignified dialogue by integrating four dimensions: anti-sycophancy, credibility, empathy, and creativity. To operationalize this approach, we introduce PersonaKnob, the first dataset tailored for modeling principled interpersonal stances, and develop Tolerant Constrained Lagrangian DPO to prevent behavioral collapse during alignment. Furthermore, we design a psychometric evaluation protocol grounded in Item Response Theory (IRT) to effectively disentangle intrinsic model personality traits from annotator bias. Experimental results demonstrate that our method achieves a robust balance across multidimensional personality preferences, successfully yielding language models that embody both dignity and peer-like engagement.

Technology Category

Application Category

📝 Abstract

Current aligned language models exhibit a dual failure mode we term the Evasive Servant: they sycophantically validate flawed user beliefs while deflecting responsibility with boilerplate disclaimers. We propose the Dignified Peer framework, which counters servility with anti-sycophancy and trustworthiness, and mitigates evasiveness through empathy and creativity. Realizing this agent requires overcoming significant challenges in data supervision, objective collapse, and evaluation bias. We address these issues by introducing the PersonaKnob dataset which features a compositional partial order structure of multiple persona preference. This data is utilized alongside a tolerant constrained Lagrangian DPO algorithm that dynamically balances all persona dimensions to prevent behavioral collapse. Additionally, we employ a psychometrically calibrated Item Response Theory evaluation protocol to disentangle latent model persona capability from confounders like judge biases. Extensive empirical studies demonstrate that our approach successfully build a LLM agent with both dignity and peer.

Problem

Research questions and friction points this paper is trying to address.

Evasive Servant

anti-sycophancy

trustworthiness

empathy

behavioral collapse

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dignified Peer

PersonaKnob

Constrained Lagrangian DPO

Item Response Theory

Anti-sycophancy

🔎 Similar Papers

No similar papers found.