Amulet: ReAlignment During Test Time for Personalized Preference Adaptation of LLMs

📅 2025-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of real-time adaptation of large language models (LLMs) to dynamic, personalized user preferences—such as cultural norms, value systems, or temporal relevance—this paper proposes a training-free, test-time alignment framework. The method models each token generation step as an online learning process guided by user-provided prompts, introducing the first token-level, prompt-guided, training-free test-time preference realignment mechanism. We derive a closed-form gradient update for optimization, enabling millisecond-scale, low-overhead adaptation. Extensive experiments across multiple LLMs, diverse benchmark datasets, and heterogeneous preference configurations demonstrate significant improvements in alignment performance. Crucially, the approach incurs negligible computational overhead, preserves generation quality, and maintains inference efficiency—without requiring parameter updates or retraining.

Technology Category

Application Category

📝 Abstract
How to align large language models (LLMs) with user preferences from a static general dataset has been frequently studied. However, user preferences are usually personalized, changing, and diverse regarding culture, values, or time. This leads to the problem that the actual user preferences often do not coincide with those trained by the model developers in the practical use of LLMs. Since we cannot collect enough data and retrain for every demand, researching efficient real-time preference adaptation methods based on the backbone LLMs during test time is important. To this end, we introduce Amulet, a novel, training-free framework that formulates the decoding process of every token as a separate online learning problem with the guidance of simple user-provided prompts, thus enabling real-time optimization to satisfy users' personalized preferences. To reduce the computational cost brought by this optimization process for each token, we additionally provide a closed-form solution for each iteration step of the optimization process, thereby reducing the computational time cost to a negligible level. The detailed experimental results demonstrate that Amulet can achieve significant performance improvements in rich settings with combinations of different LLMs, datasets, and user preferences, while maintaining acceptable computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

Align LLMs with changing user preferences
Enable real-time personalization during test
Reduce computational cost of real-time optimization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework
Real-time preference adaptation
Closed-form solution optimization
🔎 Similar Papers
No similar papers found.
Zhaowei Zhang
Zhaowei Zhang
Peking University
AI GovernanceAI AlignmentGame TheoryHuman-AI Collaboration
Fengshuo Bai
Fengshuo Bai
Shanghai Jiao Tong University
Embodied AIAI AlignmentReinforcement LearningPreference-based Learning
Qizhi Chen
Qizhi Chen
PhD Candidate of Zhejiang University
Multimodal ReasoningEmbodied AI3D Vision
Chengdong Ma
Chengdong Ma
Peking University
Reinforcement LearningMulti-Agent Systems
M
Mingzhi Wang
Institute for Artificial Intelligence, Peking University
H
Haoran Sun
Institute for Artificial Intelligence, Peking University
Z
Zilong Zheng
National Key Laboratory of General Artificial Intelligence, BIGAI
Y
Yaodong Yang
Institute for Artificial Intelligence, Peking University