Implicit State Estimation via Video Replanning

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing video-based planning methods struggle with interaction failures in partially observable environments due to their lack of online reasoning under environmental uncertainty. This paper introduces the first online video planning framework supporting real-time data fusion during interaction, enabling implicit state estimation without explicit state modeling—achieved through dynamic model parameter updating and implicit filtering of failed trajectories. The method integrates spatiotemporal video representation learning, online model adaptation, dynamic plan pruning, and a re-planning architecture. Evaluated on a newly constructed simulated manipulation benchmark, the framework achieves significant improvements: +32% in re-planning efficiency and +27% in task success rate. It enhances decision robustness and system adaptability in complex, dynamic scenarios, advancing video-driven decision-making toward practical deployment.

Technology Category

Application Category

📝 Abstract

Video-based representations have gained prominence in planning and decision-making due to their ability to encode rich spatiotemporal dynamics and geometric relationships. These representations enable flexible and generalizable solutions for complex tasks such as object manipulation and navigation. However, existing video planning frameworks often struggle to adapt to failures at interaction time due to their inability to reason about uncertainties in partially observed environments. To overcome these limitations, we introduce a novel framework that integrates interaction-time data into the planning process. Our approach updates model parameters online and filters out previously failed plans during generation. This enables implicit state estimation, allowing the system to adapt dynamically without explicitly modeling unknown state variables. We evaluate our framework through extensive experiments on a new simulated manipulation benchmark, demonstrating its ability to improve replanning performance and advance the field of video-based decision-making.

Problem

Research questions and friction points this paper is trying to address.

Estimating implicit states in partially observed environments

Adapting video planning frameworks to interaction-time failures

Improving replanning performance without explicit state modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online model parameter updates during planning

Filtering out previously failed plan generations

Implicit state estimation without explicit modeling

🔎 Similar Papers

No similar papers found.