ADAPT: Actively Discovering and Adapting to Preferences for any Task

📅 2025-04-05

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

In home environments, long-horizon tasks often lack explicit user preferences, hindering reliable preference-aligned execution. Method: We propose an active questioning paradigm for preference-adaptive task execution and introduce ADAPT—the first benchmark supporting preference identification via active questioning. We further design Reflection-DPO, a training framework that jointly models the three-stage policy (“when to ask,” “what to ask,” and “how to execute”) by integrating reflective reasoning with teacher-student distillation. Contribution/Results: Experiments on ADAPT show a 6.1% absolute improvement in satisfaction rate for unseen user preferences over zero-shot chain-of-thought baselines. This work establishes the first systematic validation of active interactive preference learning, demonstrating both efficacy and scalability in long-horizon task execution under implicit preference settings.

Technology Category

Application Category

📝 Abstract

Assistive agents should be able to perform under-specified long-horizon tasks while respecting user preferences. We introduce Actively Discovering and Adapting to Preferences for any Task (ADAPT) -- a benchmark designed to evaluate agents' ability to adhere to user preferences across various household tasks through active questioning. Next, we propose Reflection-DPO, a novel training approach for adapting large language models (LLMs) to the task of active questioning. Reflection-DPO finetunes a 'student' LLM to follow the actions of a privileged 'teacher' LLM, and optionally ask a question to gather necessary information to better predict the teacher action. We find that prior approaches that use state-of-the-art LLMs fail to sufficiently follow user preferences in ADAPT due to insufficient questioning and poor adherence to elicited preferences. In contrast, Reflection-DPO achieves a higher rate of satisfying user preferences, outperforming a zero-shot chain-of-thought baseline by 6.1% on unseen users.

Problem

Research questions and friction points this paper is trying to address.

Developing assistive agents for under-specified long-horizon tasks

Evaluating agents' ability to adhere to user preferences

Improving LLMs' active questioning for preference adaptation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark for evaluating preference adherence in tasks

Reflection-DPO finetunes LLMs via teacher-student learning

Active questioning improves user preference satisfaction

🔎 Similar Papers

AutoPal: Autonomous Adaptation to Users for Personal AI Companionship