🤖 AI Summary
Existing preference learning methods, such as Direct Preference Optimization (DPO), lack theoretical consistency under neural network hypothesis classes, making it difficult to guarantee generalization performance. This work formulates the alignment of large language models as a ranking framework with margin offsets and introduces a structure-aware SA-DPO objective. The proposed approach incorporates a semantic-distance-aware dynamic margin mechanism and a polynomial hinge loss. It establishes, for the first time, a structure-aware H-consistency theory, deriving a tight consistency bound that explicitly depends on the margin parameter γ. The analysis demonstrates that the method offers stronger theoretical guarantees and improved empirical performance, particularly in handling synonymous and challenging preference pairs.
📝 Abstract
Preference learning has become the foundation of aligning Large Language Models (LLMs) with human intent. Popular methods, such as Direct Preference Optimization (DPO), minimize surrogate losses as proxies for the intractable pairwise ranking loss. However, we demonstrate that for the equicontinuous hypothesis sets typical of neural networks, these standard surrogates are theoretically inconsistent, yielding vacuous generalization guarantees. To resolve this, we formulate LLM alignment within a margin-shifted ranking framework. We derive rigorous $H$-consistency bounds that depend on enforcing a separation margin $γ$. Crucially, we extend this to Structure-Aware $H$-consistency, introducing a novel objective (SA-DPO) that adapts the margin based on the semantic distance between responses to handle synonyms and hard pairs. Finally, we analyze the trade-off between consistency and model limitations via the Margin-Capacity Profile, proving that heavy-tailed surrogates (such as the Polynomial Hinge family) offer superior consistency guarantees for capacity-bounded models compared to the standard logistic loss used in DPO.