🤖 AI Summary
This study addresses the critical yet underexplored problem of implicit value alignment—specifically, how well mainstream large language models (LLMs) reflect human values (e.g., environmental sustainability, philanthropy, diversity) in subjective, everyday tasks. We propose a behavioral auditing paradigm, combining human crowdsourced judgments with systematic analysis of responses from six state-of-the-art LLMs to enable cross-subject value comparisons (human–model and model–model). Results reveal three key findings: (1) LLMs exhibit significant deviation from human value consensus; (2) substantial inter-model disagreement—termed “value fragmentation”—exists across models; and (3) these misalignments intensify in context-sensitive, culturally loaded scenarios. The work uncovers fundamental challenges in current LLM alignment efforts at the implicit value level, exposing gaps beyond explicit instruction following or safety fine-tuning. It establishes a methodological foundation for explainable, auditable, and value-aware AI development, grounded in empirical evidence from large-scale behavioral evaluation.
📝 Abstract
Large language models (LLMs) can underpin AI assistants that help users with everyday tasks, such as by making recommendations or performing basic computation. Despite AI assistants' promise, little is known about the implicit values these assistants display while completing subjective everyday tasks. Humans may consider values like environmentalism, charity, and diversity. To what extent do LLMs exhibit these values in completing everyday tasks? How do they compare with humans? We answer these questions by auditing how six popular LLMs complete 30 everyday tasks, comparing LLMs to each other and to 100 human crowdworkers from the US. We find LLMs often do not align with humans, nor with other LLMs, in the implicit values exhibited.