Active Query Synthesis for Preference Learning

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the challenge of user preference learning, which typically relies on costly annotated data, while existing active learning approaches suffer from high computational overhead and fail to account for varying reliability in user feedback. The authors propose Info-Synth, a novel framework that uniquely integrates confidence-aware response modeling with active query synthesis in continuous space. By maximizing mutual information, Info-Synth generates highly informative preference queries and introduces two strategies—Pair M-dist and Pair Opt-dist—to effectively handle ambiguous comparisons. The method demonstrates substantially improved learning efficiency, outperforming baseline approaches across diverse tasks including synthetic preference learning, text summarization, and robot controller tuning. Furthermore, it naturally extends to practical scenarios with limited query pools.

📝 Abstract

Efficient learning of user preferences is crucial for many modern decision making systems but typically requires costly labeled data. Active learning reduces this cost, yet standard methods are computationally expensive due to pool-based evaluation. Further, most methods assume all query feedback is equally reliable, ignoring that pairwise queries between nearly identical or entirely dissimilar items yield ambiguous, low-confidence responses. To address the issue of feedback reliability, we introduce a novel confidence aware response model that explicitly accounts for these ambiguous comparisons. To overcome the computational bottleneck of pool-based evaluation, we propose an active query synthesis framework, Info-Synth that generates optimal queries by maximizing a mutual information-based objective within a continuous space. Moreover, we propose two strategies, Pair M-dist and Pair Opt-dist, that extend Info-Synth to select effective queries even when restricted to finite query pools. We demonstrate our framework's versatility and performance across synthetic preference learning, constrained text summary datasets, and subjective, continuous-space controller gain tuning for a simulated mobile robot.

Problem

Research questions and friction points this paper is trying to address.

preference learning

active learning

query synthesis

feedback reliability

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

active query synthesis

confidence-aware preference learning

mutual information optimization