Flexible Realignment of Language Models

📅 2025-06-15

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Large language models (LLMs) suffer from rigid alignment behavior, making quantitative control over alignment objectives infeasible during both training and inference. Method: We propose TrRa+InRa, a two-stage controllable alignment framework enabling *train-time controllable alignment optimization* and *inference-time real-time switching between cognitive modes* (e.g., fast vs. slow thinking). Our approach integrates controllable logits interpolation, identity-initialized parallel adapter layers with logit-level fusion, reference-model distillation, and joint training. Contribution/Results: Experiments show that DeepSeek-R1-Distill-Qwen-1.5B achieves zero performance degradation while reducing token consumption by 54.63%. For the 7B variant, dynamic alignment adjustment during inference yields accuracy exceeding the baseline. This work establishes the first end-to-end, quantifiable, and intervenable paradigm for flexible LLM alignment—enabling precise, human-controllable modulation of model behavior across the full training-inference pipeline.

Technology Category

Application Category

📝 Abstract

Realignment becomes necessary when a language model (LM) fails to meet expected performance. We propose a flexible realignment framework that supports quantitative control of alignment degree during training and inference. This framework incorporates Training-time Realignment (TrRa), which efficiently realigns the reference model by leveraging the controllable fusion of logits from both the reference and already aligned models. For example, TrRa reduces token usage by 54.63% on DeepSeek-R1-Distill-Qwen-1.5B without any performance degradation, outperforming DeepScaleR-1.5B's 33.86%. To complement TrRa during inference, we introduce a layer adapter that enables smooth Inference-time Realignment (InRa). This adapter is initialized to perform an identity transformation at the bottom layer and is inserted preceding the original layers. During inference, input embeddings are simultaneously processed by the adapter and the original layer, followed by the remaining layers, and then controllably interpolated at the logit level. We upgraded DeepSeek-R1-Distill-Qwen-7B from a slow-thinking model to one that supports both fast and slow thinking, allowing flexible alignment control even during inference. By encouraging deeper reasoning, it even surpassed its original performance.

Problem

Research questions and friction points this paper is trying to address.

Flexible realignment of underperforming language models

Controlled fusion of logits for efficient training realignment

Layer adapter for smooth inference-time alignment control

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-time Realignment with controllable logit fusion

Inference-time Realignment using layer adapter

Flexible alignment control during training and inference

🔎 Similar Papers

Is Free Self-Alignment Possible?