A word association network methodology for evaluating implicit biases in LLMs compared to humans

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Quantifying and interpretably evaluating implicit social biases in large language models (LLMs) remains challenging. Method: We propose a semantic priming paradigm grounded in word-association networks, leveraging prompt engineering to construct interpretable association graphs, integrated with network analysis and human cognitive experiments to directly compare bias patterns across LLMs and human populations along dimensions such as gender, religion, and race. Contribution/Results: This work introduces the first graph-based formalization of semantic priming effects, enabling standardized, cross-model and cross-population bias measurement. Empirical evaluation across multiple state-of-the-art LLMs and human participants demonstrates the framework’s validity and scalability. Results reveal significant deviations between certain LLMs’ bias profiles and human norms—highlighting critical fairness gaps. The approach establishes a novel, interpretable paradigm for assessing LLM fairness and societal alignment.

Technology Category

Application Category

📝 Abstract

As Large language models (LLMs) become increasingly integrated into our lives, their inherent social biases remain a pressing concern. Detecting and evaluating these biases can be challenging because they are often implicit rather than explicit in nature, so developing evaluation methods that assess the implicit knowledge representations of LLMs is essential. We present a novel word association network methodology for evaluating implicit biases in LLMs based on simulating semantic priming within LLM-generated word association networks. Our prompt-based approach taps into the implicit relational structures encoded in LLMs, providing both quantitative and qualitative assessments of bias. Unlike most prompt-based evaluation methods, our method enables direct comparisons between various LLMs and humans, providing a valuable point of reference and offering new insights into the alignment of LLMs with human cognition. To demonstrate the utility of our methodology, we apply it to both humans and several widely used LLMs to investigate social biases related to gender, religion, ethnicity, sexual orientation, and political party. Our results reveal both convergences and divergences between LLM and human biases, providing new perspectives on the potential risks of using LLMs. Our methodology contributes to a systematic, scalable, and generalizable framework for evaluating and comparing biases across multiple LLMs and humans, advancing the goal of transparent and socially responsible language technologies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating implicit social biases in large language models

Developing word association networks to assess implicit biases

Comparing bias patterns between LLMs and human cognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Word association networks simulate semantic priming

Prompt-based approach quantifies implicit relational structures

Directly compares bias across multiple LLMs and humans

🔎 Similar Papers

No similar papers found.