Response Time Enhances Alignment with Heterogeneous Preferences

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses a key limitation in current large language model alignment methods, which assume homogeneous annotator preferences and thus fail to accurately capture the average preference of anonymous, heterogeneous populations in real-world settings. To overcome this, the authors propose leveraging user response time as an auxiliary signal and modeling the decision process through a drift diffusion model (DDM). This approach yields a consistent estimator of population-average preferences without requiring user identities or repeated annotations. The study establishes, for the first time, the identifiability of average preferences using response times alone. Empirical evaluations on both synthetic and real-world datasets demonstrate that the method significantly outperforms conventional baselines, breaking through the bias floor induced by preference heterogeneity. Notably, response times can be collected without user tracking, offering a novel paradigm for preference learning.

📝 Abstract

Aligning large language models (LLMs) to human preferences typically relies on aggregating pooled feedback into a single reward model. However, this standard approach assumes that all labelers share the same underlying preferences, ignoring the fact that real-world labelers are highly heterogeneous and usually anonymous. Consequently, relying solely on binary choice data fundamentally distorts the learned policy, making the true population-average preference unidentifiable. To overcome this critical limitation, we demonstrate that augmenting preference datasets with a simple, secondary signal -- the user's response time -- can restore the identifiability of the population's average preference. By modeling each decision as a Drift-Diffusion Model (DDM), we introduce a novel, consistent estimator of heterogeneous preferences that successfully corrects the distortions of standard choice-only labels. We prove that our estimator asymptotically converges to the true average preference even in extreme cases where each anonymous labeler contributes only a single choice. Empirically, across both synthetic and real-world datasets, our method consistently outperforms standard baselines that otherwise fail and plateau at a bias floor. Because response times are essentially free to record and require zero user tracking or identification, our results bring promises and open up new opportunities for future data-collection pipelines to improve the social benefit without requiring user-level identifiers or repeated elicitations.

Problem

Research questions and friction points this paper is trying to address.

preference alignment

heterogeneous preferences

response time

reward modeling

labeler anonymity

Innovation

Methods, ideas, or system contributions that make the work stand out.

response time

preference alignment

heterogeneous preferences