GIFT: Generalizing Intent for Flexible Test-Time Rewards

📅 2026-03-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the failure of existing reward learning methods that rely on spurious correlations in user demonstrations, leading to poor generalization under distribution shifts. To overcome this limitation, the authors propose a human-intention-based reward generalization framework that leverages large language models to infer high-level intent from demonstrations and constructs an intent-conditioned state-action equivalence mapping. This enables zero-shot cross-scenario reward transfer without retraining, marking the first approach to ground reward generalization explicitly in human intent rather than superficial visual or semantic cues. Evaluated on four simulated tabletop manipulation tasks involving over 50 unseen objects and on a real Franka Panda robot, the method significantly outperforms baseline approaches, achieving substantial improvements in both pairwise win rates at test time and state-alignment F1 scores.

Technology Category

Application Category

📝 Abstract
Robots learn reward functions from user demonstrations, but these rewards often fail to generalize to new environments. This failure occurs because learned rewards latch onto spurious correlations in training data rather than the underlying human intent that demonstrations represent. Existing methods leverage visual or semantic similarity to improve robustness, yet these surface-level cues often diverge from what humans actually care about. We present Generalizing Intent for Flexible Test-Time Rewards (GIFT), a framework that grounds reward generalization in human intent rather than surface cues. GIFT leverages language models to infer high-level intent from user demonstrations by contrasting preferred with non-preferred behaviors. At deployment, GIFT maps novel test states to behaviorally equivalent training states via intent-conditioned similarity, enabling learned rewards to generalize across distribution shifts without retraining. We evaluate GIFT on tabletop manipulation tasks with new objects and layouts. Across four simulated tasks with over 50 unseen objects, GIFT consistently outperforms visual and semantic similarity baselines in test-time pairwise win rate and state-alignment F1 score. Real-world experiments on a 7-DoF Franka Panda robot demonstrate that GIFT reliably transfers to physical settings. Further discussion can be found at https://mit-clear-lab.github.io/GIFT/
Problem

Research questions and friction points this paper is trying to address.

reward generalization
human intent
distribution shift
robot learning
test-time adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

reward generalization
human intent
language models
test-time adaptation
robot learning
🔎 Similar Papers
No similar papers found.