Beyond Preferences: Learning Alignment Principles Grounded in Human Reasons and Values

📅 2026-01-26

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge that existing large language model alignment methods inadequately capture the diverse values of multiple stakeholders. To this end, the paper proposes the Grounded Constitutional AI (GCAI) framework, which uniquely integrates users’ explicit value statements with the underlying rationales for their preferences. By extending the Inverse Constitutional AI approach and leveraging natural language processing, GCAI generates alignment principles that are both universally applicable and context-sensitive. Experimental results demonstrate that constitutions produced by GCAI significantly outperform baseline methods in human evaluations, achieving higher scores across dimensions including moral soundness, coherence, diversity, personal relevance, and willingness to deploy. This framework offers a novel pathway toward building more inclusive and ethically grounded AI alignment systems.

Technology Category

Application Category

📝 Abstract

A crucial consideration when developing and deploying Large Language Models (LLMs) is the human values to which these models are aligned. In the constitutional framework of alignment models are aligned to a set of principles (the constitution) specified in natural language. However, it is unclear how to fairly determine this constitution with widespread stakeholder input. In this work we propose Grounded Constitutional AI (GCAI), a unified framework for generating constitutions of principles that are representative of both users'general expectations toward AI (general principles) and their interaction-time preferences (contextual principles). We extend the Inverse Constitutional AI (ICAI) approach to generate contextual principles from human preference annotation data by leveraging human-provided \textit{reasons} for their preferences. We supplement these contextual principles with general principles surfaced from user statements of \textit{values} regarding AI. We show that a constitution generated by GCAI is preferred by humans over one generated through ICAI both personally, and for widespread use in governing AI behavior. Additionally participants consider the GCAI constitution to be more morally grounded, coherent, and pluralistic.

Problem

Research questions and friction points this paper is trying to address.

Constitutional AI

value alignment

human preferences

moral grounding

principle generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grounded Constitutional AI

human reasons

value alignment