🤖 AI Summary
This study identifies, for the first time, a deep “truth bias” in large language models (LLMs), wherein model assessments of solution correctness are erroneously conditioned on author demographic attributes—manifesting as attribution bias (e.g., systematically underestimating solution accuracy for African American authors) and evaluation bias (differentially scoring identical solutions based on author race).
Method: We conduct cross-domain benchmarking across mathematics, programming, commonsense reasoning, and writing; perform visualization-based analysis of code generation; and compare five value-aligned LLMs.
Contribution/Results: We demonstrate that demographic bias is internalized within core reasoning and code-generation behaviors—not merely superficial stereotype association. For instance, Asian-authored solutions receive disproportionately low scores in writing evaluation, and racially stereotyped color mappings recur in auto-generated visualizations. These findings reveal that demographic bias compromises fairness in high-stakes applications such as educational assessment, extending beyond surface-level heuristics to undermine logical integrity and technical reliability.
📝 Abstract
Despite LLMs' explicit alignment against demographic stereotypes, they have been shown to exhibit biases under various social contexts. In this work, we find that LLMs exhibit concerning biases in how they associate solution veracity with demographics. Through experiments across five human value-aligned LLMs on mathematics, coding, commonsense, and writing problems, we reveal two forms of such veracity biases: Attribution Bias, where models disproportionately attribute correct solutions to certain demographic groups, and Evaluation Bias, where models' assessment of identical solutions varies based on perceived demographic authorship. Our results show pervasive biases: LLMs consistently attribute fewer correct solutions and more incorrect ones to African-American groups in math and coding, while Asian authorships are least preferred in writing evaluation. In additional studies, we show LLMs automatically assign racially stereotypical colors to demographic groups in visualization code, suggesting these biases are deeply embedded in models' reasoning processes. Our findings indicate that demographic bias extends beyond surface-level stereotypes and social context provocations, raising concerns about LLMs' deployment in educational and evaluation settings.