🤖 AI Summary
This study investigates large language models’ (LLMs’) capacity for active grounding—i.e., dynamically establishing and correcting shared conversational common ground—in political factual question-answering, particularly when confronted with both direct knowledge queries and loaded questions containing false presuppositions. Using prompt engineering across multiple models (Llama, GPT, Claude), human-annotated evaluation, and quantitative political bias analysis, we systematically identify, for the first time, the coupling of “active grounding failure” and endogenous political bias in political contexts. We propose a “knowledge–belief calibration” evaluation framework and find that state-of-the-art models achieve less than 12% correction rates on loaded questions; moreover, their corrective behavior is significantly modulated by their own political leanings—thereby exacerbating misinformation propagation rather than mitigating it.
📝 Abstract
Communication among humans relies on conversational grounding, allowing interlocutors to reach mutual understanding even when they do not have perfect knowledge and must resolve discrepancies in each other's beliefs. This paper investigates how large language models (LLMs) manage common ground in cases where they (don't) possess knowledge, focusing on facts in the political domain where the risk of misinformation and grounding failure is high. We examine the ability of LLMs to answer direct knowledge questions and loaded questions that presuppose misinformation. We evaluate whether loaded questions lead LLMs to engage in active grounding and correct false user beliefs, in connection to their level of knowledge and their political bias. Our findings highlight significant challenges in LLMs' ability to engage in grounding and reject false user beliefs, raising concerns about their role in mitigating misinformation in political discourse.