Do Words Reflect Beliefs? Evaluating Belief Depth in Large Language Models

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work investigates whether large language models (LLMs) possess stable, intrinsic political beliefs or merely exhibit superficial textual alignment with training data. To address this, we propose the first “belief depth” evaluation framework, systematically assessing belief stability and robustness across 12 mainstream LLMs on 19 economic policy questions along two dimensions: argumentative consistency and semantic entropy. We construct an adversarial argumentation challenge set grounded in the Political Compass, augmented with uncertainty calibration and response consistency analysis. Results reveal that model political leanings are highly topic-specific rather than governed by coherent ideological frameworks; semantic entropy effectively discriminates genuine belief from surface-level alignment (AUROC = 0.78), outperforming conventional classification baselines; and up to 95% of left-leaning and 89% of right-leaning responses withstand adversarial challenges, confirming substantial belief robustness.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly shaping political discourse, yet their responses often display inconsistency when subjected to scrutiny. While prior research has primarily categorized LLM outputs as left- or right-leaning to assess their political stances, a critical question remains: Do these responses reflect genuine internal beliefs or merely surface-level alignment with training data? To address this, we propose a novel framework for evaluating belief depth by analyzing (1) argumentative consistency and (2) uncertainty quantification. We evaluate 12 LLMs on 19 economic policies from the Political Compass Test, challenging their belief stability with both supportive and opposing arguments. Our analysis reveals that LLMs exhibit topic-specific belief stability rather than a uniform ideological stance. Notably, up to 95% of left-leaning models' responses and 89% of right-leaning models' responses remain consistent under the challenge, enabling semantic entropy to achieve high accuracy (AUROC=0.78), effectively distinguishing between surface-level alignment from genuine belief. These findings call into question the assumption that LLMs maintain stable, human-like political ideologies, emphasizing the importance of conducting topic-specific reliability assessments for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Assess if LLM responses reflect genuine beliefs or training data alignment

Develop framework to evaluate belief depth via consistency and uncertainty

Determine topic-specific belief stability in LLMs, not uniform ideologies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel framework evaluates belief depth

Analyzes argumentative consistency and uncertainty

Uses semantic entropy for high accuracy

🔎 Similar Papers

Neural embedding of beliefs reveals the role of relative dissonance in human decision-making