Veracity Bias and Beyond: Uncovering LLMs' Hidden Beliefs in Problem-Solving Reasoning

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study identifies, for the first time, a deep “truth bias” in large language models (LLMs), wherein model assessments of solution correctness are erroneously conditioned on author demographic attributes—manifesting as attribution bias (e.g., systematically underestimating solution accuracy for African American authors) and evaluation bias (differentially scoring identical solutions based on author race). Method: We conduct cross-domain benchmarking across mathematics, programming, commonsense reasoning, and writing; perform visualization-based analysis of code generation; and compare five value-aligned LLMs. Contribution/Results: We demonstrate that demographic bias is internalized within core reasoning and code-generation behaviors—not merely superficial stereotype association. For instance, Asian-authored solutions receive disproportionately low scores in writing evaluation, and racially stereotyped color mappings recur in auto-generated visualizations. These findings reveal that demographic bias compromises fairness in high-stakes applications such as educational assessment, extending beyond surface-level heuristics to undermine logical integrity and technical reliability.

Technology Category

Application Category

📝 Abstract

Despite LLMs' explicit alignment against demographic stereotypes, they have been shown to exhibit biases under various social contexts. In this work, we find that LLMs exhibit concerning biases in how they associate solution veracity with demographics. Through experiments across five human value-aligned LLMs on mathematics, coding, commonsense, and writing problems, we reveal two forms of such veracity biases: Attribution Bias, where models disproportionately attribute correct solutions to certain demographic groups, and Evaluation Bias, where models' assessment of identical solutions varies based on perceived demographic authorship. Our results show pervasive biases: LLMs consistently attribute fewer correct solutions and more incorrect ones to African-American groups in math and coding, while Asian authorships are least preferred in writing evaluation. In additional studies, we show LLMs automatically assign racially stereotypical colors to demographic groups in visualization code, suggesting these biases are deeply embedded in models' reasoning processes. Our findings indicate that demographic bias extends beyond surface-level stereotypes and social context provocations, raising concerns about LLMs' deployment in educational and evaluation settings.

Problem

Research questions and friction points this paper is trying to address.

LLMs show biases linking solution correctness to demographics

Models attribute fewer correct answers to African-American groups

LLMs assign stereotypical colors to groups in visualizations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects veracity biases in LLM problem-solving reasoning

Identifies Attribution and Evaluation demographic biases

Reveals embedded racial stereotypes in LLM visualization code

🔎 Similar Papers

LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations