PerpetualWonder: Long-Horizon Action-Conditioned 4D Scene Generation

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing methods struggle to maintain consistency between physical states and visual representations when generating action-conditioned, long-term 4D scenes from a single image, often resulting in dynamic distortions during interaction. This work proposes the first closed-loop generative hybrid simulator that unifies physical and visual representations through a bidirectional coupling mechanism. By integrating multi-view supervision signals with action conditions, the framework jointly optimizes dynamics and appearance. Starting from a single input image, the method enables the generation of temporally extended, multi-step interactive 4D scenes that preserve both visual fidelity and physical plausibility, significantly enhancing dynamic consistency and interactivity in the synthesized results.

Technology Category

Application Category

📝 Abstract
We introduce PerpetualWonder, a hybrid generative simulator that enables long-horizon, action-conditioned 4D scene generation from a single image. Current works fail at this task because their physical state is decoupled from their visual representation, which prevents generative refinements to update the underlying physics for subsequent interactions. PerpetualWonder solves this by introducing the first true closed-loop system. It features a novel unified representation that creates a bidirectional link between the physical state and visual primitives, allowing generative refinements to correct both the dynamics and appearance. It also introduces a robust update mechanism that gathers supervision from multiple viewpoints to resolve optimization ambiguity. Experiments demonstrate that from a single image, PerpetualWonder can successfully simulate complex, multi-step interactions from long-horizon actions, maintaining physical plausibility and visual consistency.
Problem

Research questions and friction points this paper is trying to address.

long-horizon
action-conditioned
4D scene generation
physical plausibility
visual consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop generation
unified physical-visual representation
action-conditioned 4D scene generation
multi-view supervision
long-horizon simulation