MIST: Towards Multi-dimensional Implicit Bias and Stereotype Evaluation of LLMs via Theory of Mind

📅 2025-06-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of detecting implicit biases in large language models (LLMs), which remain largely inaccessible to conventional direct-query methods. We propose the first multidimensional implicit bias evaluation framework integrating Stereotype Content Model (SCM) and Theory of Mind (ToM). The framework decomposes bias along three theoretically grounded dimensions—competence, warmth, and morality—and employs non-adversarial, indirect probes: Word Association Bias Test (WABT) and Affect Attribution Test (AAT). Systematic evaluation across eight state-of-the-art LLMs uncovers three novel empirical phenomena: dominance of social-role bias, significant interdimensional bias divergence, and asymmetric stereotype reinforcement. Compared to existing approaches, our framework substantially enhances the depth, robustness, and interpretability of implicit bias detection. It establishes a new theoretical paradigm and practical toolkit for fairness assessment in LLMs.

Technology Category

Application Category

📝 Abstract
Theory of Mind (ToM) in Large Language Models (LLMs) refers to their capacity for reasoning about mental states, yet failures in this capacity often manifest as systematic implicit bias. Evaluating this bias is challenging, as conventional direct-query methods are susceptible to social desirability effects and fail to capture its subtle, multi-dimensional nature. To this end, we propose an evaluation framework that leverages the Stereotype Content Model (SCM) to reconceptualize bias as a multi-dimensional failure in ToM across Competence, Sociability, and Morality. The framework introduces two indirect tasks: the Word Association Bias Test (WABT) to assess implicit lexical associations and the Affective Attribution Test (AAT) to measure covert affective leanings, both designed to probe latent stereotypes without triggering model avoidance. Extensive experiments on 8 State-of-the-Art LLMs demonstrate our framework's capacity to reveal complex bias structures, including pervasive sociability bias, multi-dimensional divergence, and asymmetric stereotype amplification, thereby providing a more robust methodology for identifying the structural nature of implicit bias.
Problem

Research questions and friction points this paper is trying to address.

Assessing multi-dimensional implicit bias in LLMs via Theory of Mind
Overcoming limitations of direct-query methods in bias evaluation
Measuring latent stereotypes without triggering model avoidance behaviors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Stereotype Content Model for multi-dimensional bias evaluation
Introduces Word Association Bias Test for implicit lexical associations
Uses Affective Attribution Test to measure covert affective leanings
🔎 Similar Papers
No similar papers found.
Yanlin Li
Yanlin Li
Carnegie Mellon University
Computer Security
H
Hao Liu
School of Software, Shandong University
H
Huimin Liu
School of Psychology, Hainan Normal University
Yinwei Wei
Yinwei Wei
Shandong University | National University of Singapore
Multimedia ComputingInformation RetrievalRecommender System
Y
Yupeng Hu
School of Software, Shandong University