AnchorWeave: World-Consistent Video Generation with Retrieved Local Spatial Memories

📅 2026-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of maintaining long-term spatial consistency in controllable camera video generation, particularly the geometric misalignment and noise accumulation arising from multi-view 3D reconstruction. To circumvent error-prone global 3D reconstruction, the authors propose a generative framework grounded in multi-local geometric memory. The method employs a coverage-driven trajectory alignment mechanism to retrieve clean local geometric memories and introduces a multi-anchor weaving controller to effectively fuse information from multiple viewpoints. The resulting architecture significantly enhances long-horizon scene consistency while preserving high visual fidelity. Ablation studies confirm the effectiveness of the local geometric conditioning, coverage-based retrieval strategy, and multi-anchor control mechanism.

Technology Category

Application Category

📝 Abstract
Maintaining spatial world consistency over long horizons remains a central challenge for camera-controllable video generation. Existing memory-based approaches often condition generation on globally reconstructed 3D scenes by rendering anchor videos from the reconstructed geometry in the history. However, reconstructing a global 3D scene from multiple views inevitably introduces cross-view misalignment, as pose and depth estimation errors cause the same surfaces to be reconstructed at slightly different 3D locations across views. When fused, these inconsistencies accumulate into noisy geometry that contaminates the conditioning signals and degrades generation quality. We introduce AnchorWeave, a memory-augmented video generation framework that replaces a single misaligned global memory with multiple clean local geometric memories and learns to reconcile their cross-view inconsistencies. To this end, AnchorWeave performs coverage-driven local memory retrieval aligned with the target trajectory and integrates the selected local memories through a multi-anchor weaving controller during generation. Extensive experiments demonstrate that AnchorWeave significantly improves long-term scene consistency while maintaining strong visual quality, with ablation and analysis studies further validating the effectiveness of local geometric conditioning, multi-anchor control, and coverage-driven retrieval.
Problem

Research questions and friction points this paper is trying to address.

world consistency
video generation
3D reconstruction
cross-view misalignment
spatial memory
Innovation

Methods, ideas, or system contributions that make the work stand out.

local geometric memories
multi-anchor weaving
coverage-driven retrieval
world-consistent video generation
memory-augmented generation
🔎 Similar Papers
No similar papers found.