DialDefer: A Framework for Detecting and Mitigating LLM Dialogic Deference

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses significant judgment shifts in large language models (LLMs) during dialogue evaluation caused by framing effects—such as whether a statement is presented as a direct assertion or attributed to a speaker—undermining their consistency and fairness. The work introduces and quantifies the phenomenon of “Dialogic Deference,” proposing a directional bias metric, the Dialogic Deference Score (DDS), to reveal how LLMs differentially respond to human versus AI speakers. Analyzing over 3,000 dialogues across nine domains and four models, the research demonstrates pronounced framing effects, with |DDS| reaching up to 87 percentage points and amplifying two- to four-fold in real-world Reddit conversations. Although proposed mitigation strategies reduce deference tendencies, they often lead to overcorrection, highlighting the persistent challenge of model calibration.

Technology Category

Application Category

📝 Abstract
LLMs are increasingly used as third-party judges, yet their reliability when evaluating speakers in dialogue remains poorly understood. We show that LLMs judge identical claims differently depending on framing: the same content elicits different verdicts when presented as a statement to verify ("Is this statement correct?") versus attributed to a speaker ("Is this speaker correct?"). We call this dialogic deference and introduce DialDefer, a framework for detecting and mitigating these framing-induced judgment shifts. Our Dialogic Deference Score (DDS) captures directional shifts that aggregate accuracy obscures. Across nine domains, 3k+ instances, and four models, conversational framing induces large shifts (|DDS| up to 87pp, p<.0001) while accuracy remains stable (<2pp), with effects amplifying 2-4x on naturalistic Reddit conversations. Models can shift toward agreement (deference) or disagreement (skepticism) depending on domain -- the same model ranges from DDS = -53 on graduate-level science to +58 on social judgment. Ablations reveal that human-vs-LLM attribution drives the largest shifts (17.7pp swing), suggesting models treat disagreement with humans as more costly than with AI. Mitigation attempts reduce deference but can over-correct into skepticism, framing this as a calibration problem beyond accuracy optimization.
Problem

Research questions and friction points this paper is trying to address.

dialogic deference
LLM judgment bias
framing effect
conversational evaluation
attribution bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

dialogic deference
framing effect
LLM judgment bias
Dialogic Deference Score
calibration
🔎 Similar Papers
No similar papers found.
P
Parisa Rabbani
University of Illinois Urbana-Champaign
P
Priyam Sahoo
University of Illinois Urbana-Champaign
R
Ruben Mathew
University of Illinois Urbana-Champaign
A
Aishee Mondal
University of Illinois Urbana-Champaign
H
Harshita Ketharaman
University of Illinois Urbana-Champaign
Nimet Beyza Bozdag
Nimet Beyza Bozdag
University of Illinois Urbana-Champaign
NLPConversational AI
Dilek Hakkani-Tür
Dilek Hakkani-Tür
Professor of Computer Science, Univ. Illinois Urbana-Champaign
Speech and Language ProcessingDialogue SystemsSpoken Language UnderstandingMachine Learning