Like Father, Like Son: Kinship-Aware Preference Mapping (KARMA) for Automatic Alignment in Large Language Models

๐Ÿ“… 2025-02-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing LLM alignment methods often compare model responses across models with substantially heterogeneous capabilities, yielding ambiguous and weak preference signals. To address this, we propose a fine-grained preference data construction paradigm grounded in โ€œmodel capability affinityโ€: we define an affinity metric jointly capturing response complexity and quality, and dynamically pair and annotate preferences only among outputs from models of comparable capability. Building on this, we design a unified optimization framework integrating preference distillation and reinforcement learning. Our approach is the first to constrain alignment within a homogenized response space, thereby significantly improving win-rate consistency (+12.3%) and human preference alignment (+9.8%), while enhancing training stability and result interpretability. This work establishes a novel pathway toward high signal-to-noise-ratio, interpretable LLM alignment.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in Large Language Model (LLM) alignment have sought to mitigate the cost of human annotations by leveraging pretrained models to generate preference data. However, existing methods often compare responses from models with substantially different capabilities, yielding superficial distinctions that fail to provide meaningful guidance on what constitutes a superior response. To address this limitation, we propose Kinship-Aware pReference MApping (KARMA), a novel framework that systematically pairs responses from models with comparable competencies. By constraining preference comparisons to outputs of similar complexity and quality, KARMA enhances the informativeness of preference data and improves the granularity of alignment signals. Empirical evaluations demonstrate that our kinship-aware approach leads to more consistent and interpretable alignment outcomes, ultimately facilitating a more principled and reliable pathway for aligning LLM behavior with human preferences.
Problem

Research questions and friction points this paper is trying to address.

Reducing human annotation costs in LLM alignment
Improving meaningful preference data generation
Enhancing alignment consistency and interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Kinship-Aware Reference Mapping
Comparable Competency Pairing
Enhanced Preference Data Informativeness
๐Ÿ”Ž Similar Papers
2024-06-05arXiv.orgCitations: 1