🤖 AI Summary
This paper addresses systemic biases in LLM value alignment arising from goal under-specification, incomplete contracts, and the inherent complexity of human values. To tackle these challenges, we propose “social alignment”—a novel paradigm that integrates sociology, economics, and contract theory into a tripartite alignment framework spanning social, economic, and contractual dimensions. Rather than treating goal under-specification as a flaw, our approach reframes it as an opportunity for deliberate design and introduces participatory alignment interfaces. Through interdisciplinary conceptual modeling, structural uncertainty analysis, and value-theoretic transfer, we identify the root causes of alignment uncertainty and develop an actionable framework. Our contribution provides an original methodology to enhance the robustness, inclusivity, and real-world adaptability of LLM alignment—advancing beyond purely technical or individualistic alignment paradigms toward socially embedded, institutionally informed value coordination.
📝 Abstract
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.