OpenGVL - Benchmarking Visual Temporal Progress for Data Curation

📅 2025-09-21

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address data scarcity and high annotation costs in robotics, this paper introduces the visual temporal progress prediction task—automatically estimating task completion from video observations to enable scalable, automated annotation and quality filtering of robotic data. We propose OpenGVL, the first open-source benchmark encompassing diverse human-robot interaction scenarios, and conduct the first systematic evaluation of mainstream vision-language models (VLMs) on cross-task and cross-agent progress reasoning. Experiments reveal that open-weight VLMs achieve only 70% of closed-weight counterparts’ performance, exposing a critical gap in temporal understanding. Furthermore, we validate the feasibility of a generative value learning (GVL) framework for efficient robotic data governance. OpenGVL is publicly released, establishing a novel evaluation paradigm and foundational toolkit for joint perception-decision modeling in robotics.

Technology Category

Application Category

📝 Abstract

Data scarcity remains one of the most limiting factors in driving progress in robotics. However, the amount of available robotics data in the wild is growing exponentially, creating new opportunities for large-scale data utilization. Reliable temporal task completion prediction could help automatically annotate and curate this data at scale. The Generative Value Learning (GVL) approach was recently proposed, leveraging the knowledge embedded in vision-language models (VLMs) to predict task progress from visual observations. Building upon GVL, we propose OpenGVL, a comprehensive benchmark for estimating task progress across diverse challenging manipulation tasks involving both robotic and human embodiments. We evaluate the capabilities of publicly available open-source foundation models, showing that open-source model families significantly underperform closed-source counterparts, achieving only approximately $70%$ of their performance on temporal progress prediction tasks. Furthermore, we demonstrate how OpenGVL can serve as a practical tool for automated data curation and filtering, enabling efficient quality assessment of large-scale robotics datasets. We release the benchmark along with the complete codebase at href{github.com/budzianowski/opengvl}{OpenGVL}.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking visual temporal progress estimation for robotics tasks

Evaluating open-source foundation models for task completion prediction

Enabling automated data curation and filtering for robotics datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

OpenGVL benchmark evaluates task progress prediction

Leverages vision-language models for temporal progress estimation

Provides automated data curation for large-scale robotics datasets

🔎 Similar Papers

No similar papers found.