Context Matters: Learning Generalizable Rewards via Calibrated Features

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Human preferences exhibit context-dependent variation, impeding generalization of reward functions. Existing approaches either treat each context as an independent task or implicitly model contextual dependencies, resulting in poor sample efficiency. This paper proposes a decoupled modeling framework that explicitly separates context-invariant underlying preferences from context-dependent feature salience, yielding modular and transferable reward representations. We introduce calibrated features—context-normalized representations—and a dedicated pairwise preference querying mechanism, enabling context-aware active learning within an inverse reinforcement learning framework. Empirical evaluation demonstrates a tenfold improvement in sample efficiency; under low-data regimes (5–10 queries), reward accuracy increases by up to 15%. A user study with 12 participants confirms the method’s effectiveness and practicality for personalized, context-sensitive preference learning.

Technology Category

Application Category

📝 Abstract

A key challenge in reward learning from human input is that desired agent behavior often changes based on context. Traditional methods typically treat each new context as a separate task with its own reward function. For example, if a previously ignored stove becomes too hot to be around, the robot must learn a new reward from scratch, even though the underlying preference for prioritizing safety over efficiency remains unchanged. We observe that context influences not the underlying preference itself, but rather the $ extit{saliency}$--or importance--of reward features. For instance, stove heat affects the importance of the robot's proximity, yet the human's safety preference stays the same. Existing multi-task and meta IRL methods learn context-dependent representations $ extit{implicitly}$--without distinguishing between preferences and feature importance--resulting in substantial data requirements. Instead, we propose $ extit{explicitly}$ modeling context-invariant preferences separately from context-dependent feature saliency, creating modular reward representations that adapt to new contexts. To achieve this, we introduce $ extit{calibrated features}$--representations that capture contextual effects on feature saliency--and present specialized paired comparison queries that isolate saliency from preference for efficient learning. Experiments with simulated users show our method significantly improves sample efficiency, requiring 10x fewer preference queries than baselines to achieve equivalent reward accuracy, with up to 15% better performance in low-data regimes (5-10 queries). An in-person user study (N=12) demonstrates that participants can effectively teach their unique personal contextual preferences using our method, enabling more adaptable and personalized reward learning.

Problem

Research questions and friction points this paper is trying to address.

Learning generalizable rewards from human input across varying contexts

Distinguishing context-invariant preferences from context-dependent feature saliency

Improving sample efficiency in reward learning with calibrated features

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicitly models context-invariant preferences and feature saliency

Introduces calibrated features for contextual effects

Uses specialized paired comparison queries for efficiency

🔎 Similar Papers

No similar papers found.