Is Free Self-Alignment Possible?

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the reliance of large language model (LLM) alignment on massive human-annotated preference datasets and high computational resources. Methodologically, it introduces a training-free, annotation-free self-alignment framework that generates high-quality preference signals via self-supervised preference sampling and performs representation editing in the latent space to enable zero-shot, multi-objective cooperative control and fine-grained adjustment along preference axes. Its core contribution is the first demonstration of efficient LLM alignment fully independent of external preference data and parameter updates. Experiments show consistent improvements: +19.9% in general alignment performance and +1.9% in mathematical reasoning accuracy; robustness across strong base models; and substantial reductions in both data curation and computational costs—additionally accelerating conventional alignment pipelines such as DPO.

Technology Category

Application Category

📝 Abstract
Aligning pretrained language models (LMs) often requires large-scale preference data and substantial computational resources. These costs become even more prohibitive for multi-objective or pluralistic alignment. Is this truly necessary? Can we perform efficient alignment using only internal model capabilities, and without additional training? To answer this question, we propose AlignEZ, a novel approach that leverages (1) self-generated preference data and (2) representation editing to achieve cost-effective, efficient alignment. By operating directly on learned representations, AlignEZ independently targets different behavioral aspects without the overhead of traditional alignment methods. Our experiments reveal that this cost-efficient procedure improves performance across diverse tasks: up to 19.9% on general alignment and 1.9% on challenging mathematical reasoning tasks, even when starting from a strong base model. AlignEZ can also align models to multiple objectives simultaneously, granting fine-grained control over multiple preference axes. Finally, we show that AlignEZ can accelerate more expensive alignment procedures--such as DPO--even under limited availability of ground-truth preference data.
Problem

Research questions and friction points this paper is trying to address.

Self-alignment of language models without external data
Cost-effective alignment using internal model capabilities
Simultaneous multi-objective alignment with representation editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-generated preference data use
Representation editing technique
Multi-objective alignment capability
🔎 Similar Papers
No similar papers found.