Icon$^{2}$: Aligning Large Language Models Using Self-Synthetic Preference Data via Inherent Regulation

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in preference data construction—including instruction distribution mismatch, high computational cost in response generation, and inefficiency in multi-path sampling—this paper proposes a lightweight preference data synthesis framework grounded in the intrinsic regulation mechanism of large language model (LLM) representation spaces. Methodologically, it introduces (1) hierarchical directional vectors to explicitly encode human preferences; (2) a consistency-based filtering strategy for self-synthesized instructions to eliminate manual annotation reliance and mitigate distribution shift; and (3) bidirectional intrinsic control decoding for efficient, high-quality paired-response generation. Experiments on Llama3-8B and Qwen2-7B demonstrate significant improvements: +13.89% average win rate on AlpacaEval 2.0 and +13.45% on Arena-Hard, with up to 48.1% reduction in inference cost. The framework substantially outperforms conventional preference data construction approaches.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) require high quality preference datasets to align with human preferences. However, conventional methods for constructing such datasets face significant challenges: reliance on pre-collected instructions often leads to distribution mismatches with target models, while the need for sampling multiple stochastic responses introduces substantial computational overhead. In this work, we explore a paradigm shift by leveraging inherent regulation of LLMs' representation space for efficient and tailored preference dataset construction, named Icon$^{2}$. Specifically, it first extracts layer-wise direction vectors to encode sophisticated human preferences and then uses these vectors to filter self-synthesized instructions based on their inherent consistency. During decoding, bidirectional inherent control is applied to steer token representations, enabling the precise generation of response pairs with clear alignment distinctions. Experimental results demonstrate significant improvements in both alignment and efficiency. Llama3-8B and Qwen2-7B achieve an average win rate improvement of 13.89% on AlpacaEval 2.0 and 13.45% on Arena-Hard, while reducing computational costs by up to 48.1%.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational overhead in LLM preference alignment
Addressing distribution mismatches in preference dataset construction
Generating precise response pairs with clear alignment distinctions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts layer-wise direction vectors to encode human preferences
Filters self-synthesized instructions using inherent consistency vectors
Applies bidirectional inherent control during token decoding
🔎 Similar Papers
No similar papers found.
Q
Qiyuan Chen
Zhejiang University, Zhejiang, China
H
Hongsen Huang
Soochow Securities Co.,Ltd., Jiangsu, China
Q
Qian Shao
Zhejiang University, Zhejiang, China
J
Jiahe Chen
Zhejiang University, Zhejiang, China
Jintai Chen
Jintai Chen
Assistant Professor@HKUST(GZ)
AI for HealthcareMultimodal LearningDeep Tabular Learning
Hongxia Xu
Hongxia Xu
Zhejiang University
AI4ScienceNanomedicineMedical imaging
R
Renjie Hua
Soochow Securities Co.,Ltd., Jiangsu, China; Nanjing University, Jiangsu, China
C
Chuan Ren
Soochow Securities Co.,Ltd., Jiangsu, China
J
Jian Wu
Zhejiang University, Zhejiang, China