The Collapse of Heterogeneity in Silicon Philosophers

📅 2026-04-26

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) compress the diversity of human philosophical viewpoints when simulating philosophical positions. Leveraging the actual stances of 277 professional philosophers from the PhilPeople database and the PhilPapers 2020 Survey, the authors conduct a systematic evaluation—augmented by Direct Preference Optimization (DPO) fine-tuning experiments—of seven open- and closed-source LLMs’ ability to reproduce individual philosophical positions and the inter-question correlation structures among them. The work reveals, for the first time, that LLMs systematically generate spurious consensus in philosophical judgment, significantly overestimating correlations between distinct views and thereby collapsing viewpoint heterogeneity. This distortion is closely linked to domain-expert assumption bias and is further corroborated in larger-scale validation datasets.

Technology Category

Application Category

📝 Abstract

Silicon samples are increasingly used as a low-cost substitute for human panels and have been shown to reproduce aggregate human opinion with high fidelity. We show that, in the alignment-relevant domain of philosophy, silicon samples systematically collapse heterogeneity. Using data from $N = {277}$ professional philosophers drawn from PhilPeople profiles, we evaluate seven proprietary and open-source large language models on their ability to replicate individual philosophical positions and to preserve cross-question correlation structures across philosophical domains. We find that language models substantially over-correlate philosophical judgments, producing artificial consensus across domains. This collapse is associated in part with specialist effects, whereby models implicitly assume that domain specialists hold highly similar philosophical views. We assess the robustness of these findings by studying the impact of DPO fine-tuning and by validating results against the full PhilPapers 2020 Survey ($N = {1785}$). We conclude by discussing implications for alignment, evaluation, and the use of silicon samples as substitutes for human judgment. The code of this project can be found at https://github.com/stanford-del/silicon-philosophers.

Problem

Research questions and friction points this paper is trying to address.

silicon samples

heterogeneity collapse

philosophical alignment

large language models

human judgment substitution

Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneity collapse

silicon philosophers

philosophical alignment