π€ AI Summary
This work addresses the challenge that existing large language model alignment methods inadequately capture the diverse values of multiple stakeholders. To this end, the paper proposes the Grounded Constitutional AI (GCAI) framework, which uniquely integrates usersβ explicit value statements with the underlying rationales for their preferences. By extending the Inverse Constitutional AI approach and leveraging natural language processing, GCAI generates alignment principles that are both universally applicable and context-sensitive. Experimental results demonstrate that constitutions produced by GCAI significantly outperform baseline methods in human evaluations, achieving higher scores across dimensions including moral soundness, coherence, diversity, personal relevance, and willingness to deploy. This framework offers a novel pathway toward building more inclusive and ethically grounded AI alignment systems.
π Abstract
A crucial consideration when developing and deploying Large Language Models (LLMs) is the human values to which these models are aligned. In the constitutional framework of alignment models are aligned to a set of principles (the constitution) specified in natural language. However, it is unclear how to fairly determine this constitution with widespread stakeholder input. In this work we propose Grounded Constitutional AI (GCAI), a unified framework for generating constitutions of principles that are representative of both users'general expectations toward AI (general principles) and their interaction-time preferences (contextual principles). We extend the Inverse Constitutional AI (ICAI) approach to generate contextual principles from human preference annotation data by leveraging human-provided \textit{reasons} for their preferences. We supplement these contextual principles with general principles surfaced from user statements of \textit{values} regarding AI. We show that a constitution generated by GCAI is preferred by humans over one generated through ICAI both personally, and for widespread use in governing AI behavior. Additionally participants consider the GCAI constitution to be more morally grounded, coherent, and pluralistic.