Full-Stack Alignment: Co-Aligning AI and Institutions with Thick Models of Value

📅 2025-12-03
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses governance risks arising from misalignment between AI systems and societal institutions’ or individuals’ values. To tackle this, it proposes a “full-stack alignment” framework centered on a novel “thick value model” that distinguishes enduring values from context-sensitive preferences, thereby enabling normative reasoning, modeling of collective goods, and cross-level value embedding. Methodologically, the framework integrates value-sensitive decision architectures, socially embedded agent design, value-aware institutional and economic mechanisms, and is empirically validated across five domains: AI governance, normative agent construction, win-win negotiation, meaning-preserving incentive design, and democratic oversight institutions. Results demonstrate that the framework systematically enhances value consistency between AI development and societal well-being. It offers a theoretically grounded yet practically viable alignment paradigm for trustworthy AI—bridging normative theory, institutional design, and technical implementation.

Technology Category

Application Category

📝 Abstract
Beneficial societal outcomes cannot be guaranteed by aligning individual AI systems with the intentions of their operators or users. Even an AI system that is perfectly aligned to the intentions of its operating organization can lead to bad outcomes if the goals of that organization are misaligned with those of other institutions and individuals. For this reason, we need full-stack alignment, the concurrent alignment of AI systems and the institutions that shape them with what people value. This can be done without imposing a particular vision of individual or collective flourishing. We argue that current approaches for representing values, such as utility functions, preference orderings, or unstructured text, struggle to address these and other issues effectively. They struggle to distinguish values from other signals, to support principled normative reasoning, and to model collective goods. We propose thick models of value will be needed. These structure the way values and norms are represented, enabling systems to distinguish enduring values from fleeting preferences, to model the social embedding of individual choices, and to reason normatively, applying values in new domains. We demonstrate this approach in five areas: AI value stewardship, normatively competent agents, win-win negotiation systems, meaning-preserving economic mechanisms, and democratic regulatory institutions.
Problem

Research questions and friction points this paper is trying to address.

Aligning AI systems with institutional and individual values to prevent societal harm.
Addressing limitations of current value representation methods like utility functions.
Developing thick models of value for normative reasoning and collective goods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Full-stack alignment of AI and institutions
Thick models for representing values and norms
Structured value representation enabling normative reasoning
J
Joe Edelman
Meaning Alignment Institute
Z
Zhi-Xuan Tan
Massachusetts Institute of Technology
Ryan Lowe
Ryan Lowe
Meaning Alignment Institute
O
Oliver Klingefjord
University of Oxford
V
Vincent Wang-Mašcianica
University College London
Matija Franklin
Matija Franklin
Google DeepMind
AI AlignmentAI SafetyAI Ethics
R
R. Kearns
University of Oxford
E
Ellie Hain
University of Oxford
A
Atrisha Sarkar
Western University
M
Michiel A. Bakker
Massachusetts Institute of Technology
Fazl Barez
Fazl Barez
University of Oxford
AI SafetyExplainabilityInterpretabilityAI Governance and Policy
D
D. Duvenaud
University of Toronto
J
J. Foerster
University of Oxford
Iason Gabriel
Iason Gabriel
Senior Staff Research Scientist, Google DeepMind
Political TheoryMoral PhilosophyPhilosophy of AIGlobal JusticeHuman Rights
J
Joseph Gubbels
McGill University
B
B. Goodman
University of Oxford
Andreas Haupt
Andreas Haupt
Stanford University
EconomicsArtificial IntelligencePersonalisationMarket Design
J
J. Heitzig
Potsdam Institute for Climate Impact Research
J
J. Jara-Ettinger
Yale University
Atoosa Kasirzadeh
Atoosa Kasirzadeh
Carnegie Mellon University
AI EthicsAI GovernancePhilosophyMathematical Optimization
J
J. Kirkpatrick
University of Oxford
Andrew Koh
Andrew Koh
Massachusetts Institute of Technology
W
W. B. Knox
UT Austin
P
Philipp Koralus
University of Oxford
Joel Lehman
Joel Lehman
Nephesh
AI SafetyReinforcement LearningAI and PsychologyArtificial LifeOpen-endedness
Sydney Levine
Sydney Levine
Visiting Research Scientist, Google Deepmind
moral psychologycognitive scienceAI safetyAI ethics
S
Samuele G. Marro
University of Oxford
Manon Revel
Manon Revel
Massachussets Institute of Technology
applied mathematics
T
Toby Shorin
University of Oxford
M
Morgan Sutherland
University of Oxford
Michael Henry Tessler
Michael Henry Tessler
DeepMind
Ivan Vendrov
Ivan Vendrov
Midjourney
J
James Wilken-Smith
University of Oxford