Beyond Human Judgment: A Bayesian Evaluation of LLMs' Moral Values Understanding

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This study investigates discrepancies between large language models (LLMs) and humans in moral understanding. To address limitations of conventional majority-voting evaluation, we introduce a novel Bayesian assessment framework that explicitly disentangles annotator disagreement into aleatoric noise and epistemic uncertainty—enabled by 250,000+ fine-grained annotations from 700 human annotators on over 100,000 social media, news, and forum texts. Leveraging GPU-accelerated large-scale model querying (>1 million queries), we quantitatively benchmark mainstream LLMs. Results show that LLMs achieve moral judgment accuracy consistently within the top 25% of human annotators, with significantly lower false negative rates—indicating superior sensitivity to moral risks. Our Bayesian modeling approach establishes a more interpretable, robust, and theoretically grounded benchmark for AI ethics evaluation.

Technology Category

Application Category

📝 Abstract

How do large language models understand moral dimensions compared to humans? This first large-scale Bayesian evaluation of market-leading language models provides the answer. In contrast to prior work using deterministic ground truth (majority or inclusion rules), we model annotator disagreements to capture both aleatoric uncertainty (inherent human disagreement) and epistemic uncertainty (model domain sensitivity). We evaluate top language models (Claude Sonnet 4, DeepSeek-V3, Llama 4 Maverick) across 250K+ annotations from ~700 annotators on 100K+ texts spanning social media, news, and forums. Our GPU-optimized Bayesian framework processed 1M+ model queries, revealing that AI models typically rank among the top 25% of human annotators, achieving much better-than-average balanced accuracy. Importantly, we find that AI produces far fewer false negatives than humans, highlighting their more sensitive moral detection capabilities.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' moral understanding compared to humans

Modeling annotator disagreements to capture uncertainty types

Assessing AI moral detection capabilities across diverse text sources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian framework models annotator uncertainty

Evaluates models on 250K+ human moral annotations

GPU-optimized processing of 1M+ model queries

🔎 Similar Papers

A Survey on Moral Foundation Theory and Pre-Trained Language Models: Current Advances and Challenges