Societal Alignment Frameworks Can Improve LLM Alignment

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses systemic biases in LLM value alignment arising from goal under-specification, incomplete contracts, and the inherent complexity of human values. To tackle these challenges, we propose “social alignment”—a novel paradigm that integrates sociology, economics, and contract theory into a tripartite alignment framework spanning social, economic, and contractual dimensions. Rather than treating goal under-specification as a flaw, our approach reframes it as an opportunity for deliberate design and introduces participatory alignment interfaces. Through interdisciplinary conceptual modeling, structural uncertainty analysis, and value-theoretic transfer, we identify the root causes of alignment uncertainty and develop an actionable framework. Our contribution provides an original methodology to enhance the robustness, inclusivity, and real-world adaptability of LLM alignment—advancing beyond purely technical or individualistic alignment paradigms toward socially embedded, institutionally informed value coordination.

Technology Category

Application Category

📝 Abstract
Recent progress in large language models (LLMs) has focused on producing responses that meet human expectations and align with shared values - a process coined alignment. However, aligning LLMs remains challenging due to the inherent disconnect between the complexity of human values and the narrow nature of the technological approaches designed to address them. Current alignment methods often lead to misspecified objectives, reflecting the broader issue of incomplete contracts, the impracticality of specifying a contract between a model developer, and the model that accounts for every scenario in LLM alignment. In this paper, we argue that improving LLM alignment requires incorporating insights from societal alignment frameworks, including social, economic, and contractual alignment, and discuss potential solutions drawn from these domains. Given the role of uncertainty within societal alignment frameworks, we then investigate how it manifests in LLM alignment. We end our discussion by offering an alternative view on LLM alignment, framing the underspecified nature of its objectives as an opportunity rather than perfect their specification. Beyond technical improvements in LLM alignment, we discuss the need for participatory alignment interface designs.
Problem

Research questions and friction points this paper is trying to address.

Challenges in aligning LLMs with human values due to complexity.
Misspecified objectives in current LLM alignment methods.
Need for societal alignment frameworks to improve LLM alignment.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates societal alignment frameworks insights
Investigates uncertainty in LLM alignment
Proposes participatory alignment interface designs
🔎 Similar Papers
No similar papers found.
Karolina Stańczak
Karolina Stańczak
ETH Zurich
natural language processing
Nicholas Meade
Nicholas Meade
PhD Student, McGill University and Mila
Natural Language ProcessingAI Safety
Mehar Bhatia
Mehar Bhatia
PhD Student at McGill University & MILA
Natural Language ProcessingAI AlignmentAI Safety
Hattie Zhou
Hattie Zhou
Mila – Quebec AI Institute, Université de Montréal, Anthropic
K
Konstantin Bottinger
Fraunhofer AISEC
J
Jeremy Barnes
ServiceNow, Mila – Quebec AI Institute, McGill University
Jason Stanley
Jason Stanley
Head of AI Research Deployment and Director of Applied AI Research, ServiceNow
trustworthy AIAI securityAI safetyR&D
J
Jessica Montgomery
University of Cambridge
Richard Zemel
Richard Zemel
Professor of Computer Science, University of Toronto
Machine LearningComputer VisionNeural Coding
Nicolas Papernot
Nicolas Papernot
University of Toronto and Vector Institute
Computer SecurityDeep LearningData Privacy
Nicolas Chapados
Nicolas Chapados
ServiceNow Research, Mila, Polytechnique Montréal (adjunct)
Deep LearningArtificial IntelligenceStatisticsForecasting
D
Denis Therien
Mila – Quebec AI Institute, McGill University
T
T. Lillicrap
Google DeepMind
Ana Marasović
Ana Marasović
University of Utah
Natural Language Processing
S
Sylvie Delacroix
King’s College London
Gillian K. Hadfield
Gillian K. Hadfield
Johns Hopkins University, Dept of Computer Science and School of Government and Policy
AI policygovernance and safetyhuman and machine normative systems
Siva Reddy
Siva Reddy
McGill University, Mila Quebec AI Institute
Natural Language ProcessingComputational LinguisticsDeep LearningSemantics