Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models

πŸ“… 2025-04-23
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work investigates whether large language models (LLMs) possess stable, intrinsic political beliefs or merely exhibit superficial textual alignment with training data. To address this, we propose the first β€œbelief depth” evaluation framework, systematically assessing belief stability and robustness across 12 mainstream LLMs on 19 economic policy questions along two dimensions: argumentative consistency and semantic entropy. We construct an adversarial argumentation challenge set grounded in the Political Compass, augmented with uncertainty calibration and response consistency analysis. Results reveal that model political leanings are highly topic-specific rather than governed by coherent ideological frameworks; semantic entropy effectively discriminates genuine belief from surface-level alignment (AUROC = 0.78), outperforming conventional classification baselines; and up to 95% of left-leaning and 89% of right-leaning responses withstand adversarial challenges, confirming substantial belief robustness.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) are increasingly shaping political discourse, yet their responses often display inconsistency when subjected to scrutiny. While prior research has primarily categorized LLM outputs as left- or right-leaning to assess their political stances, a critical question remains: Do these responses reflect genuine internal beliefs or merely surface-level alignment with training data? To address this, we propose a novel framework for evaluating belief depth by analyzing (1) argumentative consistency and (2) uncertainty quantification. We evaluate 12 LLMs on 19 economic policies from the Political Compass Test, challenging their belief stability with both supportive and opposing arguments. Our analysis reveals that LLMs exhibit topic-specific belief stability rather than a uniform ideological stance. Notably, up to 95% of left-leaning models' responses and 89% of right-leaning models' responses remain consistent under the challenge, enabling semantic entropy to achieve high accuracy (AUROC=0.78), effectively distinguishing between surface-level alignment from genuine belief. These findings call into question the assumption that LLMs maintain stable, human-like political ideologies, emphasizing the importance of conducting topic-specific reliability assessments for real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Assess if LLM responses reflect genuine beliefs or training data alignment
Develop framework to evaluate belief depth via consistency and uncertainty
Determine topic-specific belief stability in LLMs, not uniform ideologies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel framework evaluates belief depth
Analyzes argumentative consistency and uncertainty
Uses semantic entropy for high accuracy