🤖 AI Summary
This study addresses the ethical risks inherent in reducing pluralistic and often conflicting human values to technical optimization problems, particularly in high-stakes contexts. It proposes an ironic framework, “ValueMulch,” which simulates current large language model alignment practices through fictional “Mash Models” (MMs) and preference data from 32 communities. While the system appears successfully aligned according to surface-level metrics, it reveals the hollowness of purely technical approaches to value alignment in ethical substance. The work critiques the prevailing trend of over-engineering alignment, highlighting its blind spot in neglecting social complexity. By exposing these limitations, the research advocates for a shift in AI alignment discourse—from algorithmic fine-tuning toward deeper ethical reflection that acknowledges the irreducible plurality of human values.
📝 Abstract
Pluralistic alignment has emerged as a promising approach for ensuring that large language models (LLMs) faithfully represent the diversity, nuance, and conflict inherent in human values. In this work, we study a high-stakes deployment context - mulching - where automated systems transform selected individuals into nutrient-rich slurry for the dual purposes of food security and aesthetic population management. Building on recent pluralistic alignment frameworks, we introduce ValueMulch, a reproducible training, deployment, and certification pipeline for aligning mulching models (MMs) to a wide range of community norms. Through a real-world testbed spanning 32 communities, we show that ValueMulch improves distributional agreement with community mulching preferences relative to frontier baselines. We conclude with a discussion of ethical considerations, limitations, and implications for researchers seeking to align systems to the full spectrum of human values - especially when those values are inconsistent, commercially inconvenient, or nutritionally underutilized. Author's note: This piece builds on prior existing work Keyes et al in 2019 that satirized cannibalism as a parody for approaches that imbue ethics into problematic technology. We bring those ideas to today's era with the proliferation of large language models in everyday lives, as a critique of current AI pluralistic alignment literature. Our work does not intend to argue that all alignment practices are evil, but rather that if framing value design as a technical problem enables technology systems to enact harms, then perhaps this framing is not enough.