Robometer: Scaling General-Purpose Robotic Reward Models via Trajectory Comparisons

📅 2026-03-02

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limitations of existing general-purpose robotic reward models, which rely on absolute progress labels from expert trajectories and struggle to scale to large datasets containing abundant failed or suboptimal demonstrations due to annotation ambiguity. To overcome this, we propose Robometer, a novel framework that jointly models intra-trajectory frame-level progress supervision and inter-trajectory preference supervision through a dual-objective loss function combining frame-level progress loss and trajectory-wise preference ranking loss. Our approach enables efficient learning of a universal reward function from diverse data, including expert, failed, and augmented trajectories. Contributions include a scalable reward learning paradigm, the release of RBM-1M—a million-scale dataset of diverse robotic trajectories—and significant improvements in reward model generalization across multi-task and cross-embodiment settings, outperforming current state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

General-purpose robot reward models are typically trained to predict absolute task progress from expert demonstrations, providing only local, frame-level supervision. While effective for expert demonstrations, this paradigm scales poorly to large-scale robotics datasets where failed and suboptimal trajectories are abundant and assigning dense progress labels is ambiguous. We introduce Robometer, a scalable reward modeling framework that combines intra-trajectory progress supervision with inter-trajectory preference supervision. Robometer is trained with a dual objective: a frame-level progress loss that anchors reward magnitude on expert data, and a trajectory-comparison preference loss that imposes global ordering constraints across trajectories of the same task, enabling effective learning from both real and augmented failed trajectories. To support this formulation at scale, we curate RBM-1M, a reward-learning dataset comprising over one million trajectories spanning diverse robot embodiments and tasks, including substantial suboptimal and failure data. Across benchmarks and real-world evaluations, Robometer learns more generalizable reward functions than prior methods and improves robot learning performance across a diverse set of downstream applications. Code, model weights, and videos at https://robometer.github.io/.

Problem

Research questions and friction points this paper is trying to address.

robotic reward models

trajectory comparisons

failed trajectories

reward learning

scalability

Innovation

Methods, ideas, or system contributions that make the work stand out.

reward modeling

trajectory comparison

preference learning