🤖 AI Summary
This paper addresses the long-overlooked central role of data in AI alignment, identifying systemic unreliability in human and AI feedback—stemming from subjective biases, temporal drift, context dependence, and model-induced value distortion. To address this, we propose the first data-centric AI alignment framework, systematically characterizing distortion mechanisms across heterogeneous feedback sources and establishing three novel research directions: optimized feedback acquisition, robust data cleaning, and feedback validation. Methodologically, we integrate human factors analysis, temporal behavioral modeling, and feedback consistency assessment. Our work rigorously uncovers the fundamental causal pathway through which data quality governs alignment efficacy, thereby providing both a theoretical foundation and a practical, implementable roadmap for building trustworthy, stable, and value-consistent AI systems. (149 words)
📝 Abstract
As AI systems become increasingly capable and influential, ensuring their alignment with human values, preferences, and goals has become a critical research focus. Current alignment methods primarily focus on designing algorithms and loss functions but often underestimate the crucial role of data. This paper advocates for a shift towards data-centric AI alignment, emphasizing the need to enhance the quality and representativeness of data used in aligning AI systems. In this position paper, we highlight key challenges associated with both human-based and AI-based feedback within the data-centric alignment framework. Through qualitative analysis, we identify multiple sources of unreliability in human feedback, as well as problems related to temporal drift, context dependence, and AI-based feedback failing to capture human values due to inherent model limitations. We propose future research directions, including improved feedback collection practices, robust data-cleaning methodologies, and rigorous feedback verification processes. We call for future research into these critical directions to ensure, addressing gaps that persist in understanding and improving data-centric alignment practices.