Preference Learning with Response Time

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing preference learning methods rely predominantly on binary choice data, neglecting the decision confidence information embedded in user response times. This work introduces the EZ-Diffusion model—a simplified drift-diffusion model grounded in evidence accumulation—into preference learning for the first time. We propose a joint modeling framework based on Neyman-orthogonal loss to simultaneously learn the reward function and the response-time dynamics. Theoretically, we establish that our estimator achieves oracle-optimal convergence rates under both linear and nonparametric settings, reducing estimation error from exponential to polynomial dependence on model complexity. Empirically, on image preference tasks, our method attains comparable performance using only one-third of the training samples, demonstrating substantial gains in sample efficiency and generalization robustness. The core innovation lies in the integration of a time-aware causal modeling paradigm with orthogonalized estimation techniques.

Technology Category

Application Category

📝 Abstract

This paper investigates the integration of response time data into human preference learning frameworks for more effective reward model elicitation. While binary preference data has become fundamental in fine-tuning foundation models, generative AI systems, and other large-scale models, the valuable temporal information inherent in user decision-making remains largely unexploited. We propose novel methodologies to incorporate response time information alongside binary choice data, leveraging the Evidence Accumulation Drift Diffusion (EZ) model, under which response time is informative of the preference strength. We develop Neyman-orthogonal loss functions that achieve oracle convergence rates for reward model learning, matching the theoretical optimal rates that would be attained if the expected response times for each query were known a priori. Our theoretical analysis demonstrates that for linear reward functions, conventional preference learning suffers from error rates that scale exponentially with reward magnitude. In contrast, our response time-augmented approach reduces this to polynomial scaling, representing a significant improvement in sample efficiency. We extend these guarantees to non-parametric reward function spaces, establishing convergence properties for more complex, realistic reward models. Our extensive experiments validate our theoretical findings in the context of preference learning over images.

Problem

Research questions and friction points this paper is trying to address.

Integrating response time data into preference learning frameworks

Improving reward model elicitation using temporal decision-making information

Reducing error rates in preference learning with response time analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates response time into preference learning

Uses EZ model for preference strength inference

Develops Neyman-orthogonal loss functions

🔎 Similar Papers

Enhancing Preference-based Linear Bandits via Human Response Time