Decoding Alignment: A Critical Survey of LLM Development Initiatives through Value-setting and Data-centric Lens

📅 2025-08-23

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This study addresses two core dimensions of large language model (LLM) alignment: value specification and data construction. Through a systematic audit of publicly available technical documentation from six mainstream LLM projects—including both proprietary and open-weight models—we employ literature review, qualitative content analysis, and cross-case comparison to extract and code alignment goal definitions, data provenance, annotation protocols, and value trade-off strategies. We introduce the novel “Values–Data Centered” dual-lens analytical framework, revealing significant disparities between proprietary and open-weight models in value objective selection and training data governance. Building on this, we develop a socio-technical critical alignment framework that identifies prevalent patterns—including ambiguous value articulation, insufficient data traceability, and opaque trade-offs—as well as structural risks. Our findings provide theoretical grounding and actionable guidance for normative, auditable, and responsible LLM alignment practices.

Technology Category

Application Category

📝 Abstract

AI Alignment, primarily in the form of Reinforcement Learning from Human Feedback (RLHF), has been a cornerstone of the post-training phase in developing Large Language Models (LLMs). It has also been a popular research topic across various disciplines beyond Computer Science, including Philosophy and Law, among others, highlighting the socio-technical challenges involved. Nonetheless, except for the computational techniques related to alignment, there has been limited focus on the broader picture: the scope of these processes, which primarily rely on the selected objectives (values), and the data collected and used to imprint such objectives into the models. This work aims to reveal how alignment is understood and applied in practice from a value-setting and data-centric perspective. For this purpose, we investigate and survey (`audit') publicly available documentation released by 6 LLM development initiatives by 5 leading organizations shaping this technology, focusing on proprietary (OpenAI's GPT, Anthropic's Claude, Google's Gemini) and open-weight (Meta's Llama, Google's Gemma, and Alibaba's Qwen) initiatives, all published in the last 3 years. The findings are documented in detail per initiative, while there is also an overall summary concerning different aspects, mainly from a value-setting and data-centric perspective. On the basis of our findings, we discuss a series of broader related concerns.

Problem

Research questions and friction points this paper is trying to address.

Examining how AI alignment is practically implemented through value-setting

Investigating data-centric approaches in LLM development initiatives

Auditing alignment processes in proprietary and open-weight LLM projects

Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying LLM alignment via value-setting perspective

Auditing data-centric approaches in RLHF processes

Analyzing proprietary and open-weight model documentation

🔎 Similar Papers

How Ethical Should AI Be? How AI Alignment Shapes the Risk Preferences of LLMs