Token-weighted Direct Preference Optimization with Attention

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses a key limitation of conventional Direct Preference Optimization (DPO), which treats all tokens uniformly despite their varying semantic importance. While existing token-level preference optimization approaches rely on heuristic rules or auxiliary models—suffering from poor robustness and high computational overhead—this paper proposes Token-weighted DPO (TwDPO), instantiated as AttentionPO. AttentionPO leverages the intrinsic attention mechanisms of large language models to dynamically generate token weights, enabling content-aware preference optimization without requiring any additional training. By integrating a two-stage forward inference strategy, the method achieves state-of-the-art performance across multiple benchmarks, including AlpacaEval, MT-Bench, and ArenaHard, demonstrating both superior efficacy and computational efficiency.
📝 Abstract
Direct Preference Optimization (DPO) aligns Large Language Models with human preferences without the need for a separate reward model. However, DPO treats all tokens in responses equally, neglecting the differing importance of individual tokens. Existing token-level PO methods compute the token weights using either token-position-based heuristic functions or probability estimates given by a separately trained model, which lacks robustness and incurs extra training cost. In contrast, we propose Token-weighted DPO (TwDPO) -- a novel training objective grounded on token-weighted RL -- and AttentionPO -- an instantiation of TwDPO that uses attention from the LLM itself to estimate token weights. AttentionPO prompts the LLM to serve as a pairwise judge and check where the model attends when comparing the responses. This design makes AttentionPO content-aware, adjusting weights based on response content, and efficient, incurring only two extra forward passes per example. Experiment results show that AttentionPO significantly improves performance on AlpacaEval, MT-Bench, and ArenaHard, surpassing existing Preference Optimization methods.
Problem

Research questions and friction points this paper is trying to address.

Direct Preference Optimization
token weighting
Large Language Models
preference alignment
attention mechanism
Innovation

Methods, ideas, or system contributions that make the work stand out.

Token-weighted DPO
Attention-based weighting
Preference Optimization
Content-aware alignment
Efficient LLM training