Ego2World: Compiling Egocentric Cooking Videos into Executable Worlds for Belief-State Planning

📅 2026-05-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

227K/year
🤖 AI Summary
This work addresses the planning challenges faced by embodied agents in partially observable home environments—particularly those involving object memory, state tracking, and recovery from failures—by introducing a novel approach that translates real-world first-person human cooking videos (from the HD-EPIC dataset) into an executable symbolic world simulator grounded in graph transition rules. The method explicitly decouples the hidden ground-truth world graph from the agent’s belief graph, underscoring the critical role of belief maintenance in effective planning. Experimental results demonstrate that conventional action-overlap metrics tend to overestimate task success rates, whereas incorporating a persistent belief memory mechanism substantially improves task completion performance and reduces redundant visual exploration.
📝 Abstract
Embodied agents in household environments must plan under partial observation: they need to remember objects, track state changes, and recover when actions fail. Existing benchmarks only partially test this ability. Egocentric video datasets capture realistic human activities but remain passive, while interactive simulators support execution but rely on synthetic scenes and hand-crafted dynamics, introducing a sim-to-real gap and often assuming fully observable state. We introduce Ego2World, an executable benchmark that turns egocentric cooking videos into executable symbolic worlds governed by graph-transition rules. Built on HD-EPIC, Ego2World derives reusable transition rules from video annotations and executes them in a hidden symbolic world graph. During evaluation, the simulator maintains the hidden world graph, while the agent plans over its own partial belief graph using only local observations and execution feedback. This separation forces agents to update memory and replan without observing the true world state. Experiments show that action-overlap scores overestimate physical-state success, and that persistent belief memory improves task completion while reducing repeated visual exploration -- suggesting that belief maintenance should be a first-class target of embodied-agent evaluation.
Problem

Research questions and friction points this paper is trying to address.

embodied agents
partial observability
belief-state planning
executable benchmark
sim-to-real gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ego2World
belief-state planning
executable symbolic world
egocentric video
partial observability