Beyond Pixels: Learning Invariant Rewards for Real-World Robotics From a Few Demonstrations

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the challenge of designing reward functions with strong generalization capabilities for reinforcement learning in open-world settings, where existing vision-based methods often overfit to pixel distributions and fail to adapt to variations in object identity, position, or viewpoint. The authors propose a framework that learns vision-agnostic, symbolic reward functions from only a few demonstrations by offline discovery of task-level behavioral invariants. This shifts reward modeling from pixel-level fitting to encoding structured policies and physical constraints, combined with hybrid symbolic-numerical reasoning to distill demonstrations effectively. The approach achieves, for the first time, automatic extraction of symbolic rewards from minimal demonstrations that enable zero-shot cross-scenario generalization. It significantly improves policy alignment and ranking across eight Meta-World and three Franka tasks, and demonstrates effective zero-shot transfer under real-world variations in object pose, camera viewpoint, and object identity.

📝 Abstract

Designing reward functions that generalize beyond controlled laboratory settings remains a fundamental challenge in reinforcement learning for robotics. In open-world manipulation problems, a single task can appear in numerous variants through different object instances, positions, and camera viewpoints. Recent vision-based reward models tend to memorize specific pixel distributions and fail to generalize beyond their training conditions. To address this, we propose a framework that learns invariant symbolic reward functions from as few as five demonstrations. The insight is to shift from visual feature-fitting to the discovery of behavioral invariants: task-level properties that remain constant across diverse visual instantiations. The framework has two coupled components: a structural reward formulation that encodes task-level strategies and physical constraints while preserving optimal policy invariance, and a hybrid symbolic-numerical procedure that distills these invariants from demonstrations without online interaction. Experiments on eight Meta-World tasks and three Franka manipulation tasks demonstrate that our method achieves stronger process alignment and policy rollout ranking abilities compared to baselines, accelerating downstream policy learning. Three real-world out-of-distribution experiments further show that the same learned reward generalizes zero-shot to position, viewpoint, and object variations, enabling a single reward representation to be reused across diverse task variants in practice.

Problem

Research questions and friction points this paper is trying to address.

reward generalization

real-world robotics

visual invariance

out-of-distribution

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

invariant reward learning

symbolic reward functions

few-shot demonstration

behavioral invariants

zero-shot generalization

🔎 Similar Papers

Stabilizing Contrastive RL: Techniques for Robotic Goal Reaching from Offline Data

2023-06-06International Conference on Learning RepresentationsCitations: 4