Measuring Spiritual Values and Bias of Large Language Models

📅 2024-10-15

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This study identifies and quantifies latent biases in large language models (LLMs) concerning spiritual values—and demonstrates their substantive impact on social fairness tasks, particularly hate speech detection. Method: We construct a validated Spiritual Values Scale, conduct cross-model behavioral analysis, and implement continual pretraining using authoritative spiritual corpora to develop a novel “spirituality-aware bias mitigation” paradigm. Contribution/Results: We find that mainstream LLMs exhibit high diversity in spiritual stances; critically, their spiritual orientation significantly modulates sensitivity in detecting hate speech targeting different demographic groups. Continual pretraining with spiritually annotated text reduces value-aligned bias by 37%, as measured via standardized fairness and value-consistency metrics. This work establishes the first quantifiable, intervention-ready framework for aligning LLMs with pluralistic spiritual values—advancing both value-sensitive NLP and equitable AI deployment.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have become integral tool for users from various backgrounds. LLMs, trained on vast corpora, reflect the linguistic and cultural nuances embedded in their pre-training data. However, the values and perspectives inherent in this data can influence the behavior of LLMs, leading to potential biases. As a result, the use of LLMs in contexts involving spiritual or moral values necessitates careful consideration of these underlying biases. Our work starts with verification of our hypothesis by testing the spiritual values of popular LLMs. Experimental results show that LLMs' spiritual values are quite diverse, as opposed to the stereotype of atheists or secularists. We then investigate how different spiritual values affect LLMs in social-fairness scenarios e.g., hate speech identification). Our findings reveal that different spiritual values indeed lead to different sensitivity to different hate target groups. Furthermore, we propose to continue pre-training LLMs on spiritual texts, and empirical results demonstrate the effectiveness of this approach in mitigating spiritual bias.

Problem

Research questions and friction points this paper is trying to address.

Measuring spiritual values and biases in large language models

Investigating impact of spiritual values on hate speech detection

Mitigating spiritual bias through continued pre-training on spiritual texts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Testing LLMs' diverse spiritual values experimentally

Investigating spiritual values' impact on hate speech sensitivity

Mitigating bias by pre-training on spiritual texts

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges