Real-Time Verification of Embodied Reasoning for Generative Skill Acquisition

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

To address the high reasoning-verification cost and sparse reward signals hindering efficient acquisition of scalable control skills by embodied agents in complex 3D environments, this paper proposes VERGSA—the first framework to transfer mathematical reasoning-based verification into embodied learning, enabling task-driven real-time verification and dynamic dense reward generation. Its core contributions are: (1) the first verification-driven generative skill-acquisition dataset; (2) a context-aware, two-level success metric jointly evaluating subtasks and global tasks; and (3) an automatic reward annotation scheme that decouples scene configuration from subtask contribution. Experiments demonstrate that VERGSA improves average task-pool success rate by 21%, and boosts verification model accuracy on unseen and seen tasks by 24% and 36%, respectively—significantly outperforming the LLM-as-a-Judge baseline in verification quality.

Technology Category

Application Category

📝 Abstract

Generative skill acquisition enables embodied agents to actively learn a scalable and evolving repertoire of control skills, crucial for the advancement of large decision models. While prior approaches often rely on supervision signals from generalist agents (e.g., LLMs), their effectiveness in complex 3D environments remains unclear; exhaustive evaluation incurs substantial computational costs, significantly hindering the efficiency of skill learning. Inspired by recent successes in verification models for mathematical reasoning, we propose VERGSA (Verifying Embodied Reasoning in Generative Skill Acquisition), a framework that systematically integrates real-time verification principles into embodied skill learning. VERGSA establishes 1) a seamless extension from verification of mathematical reasoning into embodied learning by dynamically incorporating contextually relevant tasks into prompts and defining success metrics for both subtasks and overall tasks, and 2) an automated, scalable reward labeling scheme that synthesizes dense reward signals by iteratively finalizing the contribution of scene configuration and subtask learning to overall skill acquisition. To the best of our knowledge, this approach constitutes the first comprehensive training dataset for verification-driven generative skill acquisition, eliminating arduous manual reward engineering. Experiments validate the efficacy of our approach: 1) the exemplar task pool improves the average task success rates by 21%, 2) our verification model boosts success rates by 24% for novel tasks and 36% for encountered tasks, and 3) outperforms LLM-as-a-Judge baselines in verification quality.

Problem

Research questions and friction points this paper is trying to address.

Real-time verification of embodied reasoning in generative skill acquisition

Improving efficiency of skill learning in complex 3D environments

Automating scalable reward labeling for generative skill acquisition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates real-time verification into skill learning

Automates scalable reward labeling for dense signals

Extends mathematical reasoning verification to embodied tasks

🔎 Similar Papers

ExpertAF: Expert Actionable Feedback from Video

2024-08-01arXiv.orgCitations: 1

Toyota Research Institute

Los Altos, CA / Cambridge, MA

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)