🤖 AI Summary
This work addresses the challenge of efficient 3D scene understanding and lifelong mapping in dynamic environments with repeated revisits. Inspired by human cognition, the authors propose a memory-driven mapping framework that constructs a static scene memory bank and leverages multi-stage motion cues to identify dynamic objects. Upon revisiting a scene, the system enables memory recall, camera relocalization, and memory updating. The approach innovatively integrates human-like cognitive mechanisms into 3D mapping by fusing depth estimation, camera pose priors, and image sequences within a factor graph optimization framework, facilitating consistent cross-session scene understanding. Experiments demonstrate state-of-the-art performance in video depth estimation, pose reconstruction, and 3D mapping, significantly improving mapping efficiency and consistency over long-term, multi-visit scenarios.
📝 Abstract
We present CogniMap3D, a bioinspired framework for dynamic 3D scene understanding and reconstruction that emulates human cognitive processes. Our approach maintains a persistent memory bank of static scenes, enabling efficient spatial knowledge storage and rapid retrieval. CogniMap3D integrates three core capabilities: a multi-stage motion cue framework for identifying dynamic objects, a cognitive mapping system for storing, recalling, and updating static scenes across multiple visits, and a factor graph optimization strategy for refining camera poses. Given an image stream, our model identifies dynamic regions through motion cues with depth and camera pose priors, then matches static elements against its memory bank. When revisiting familiar locations, CogniMap3D retrieves stored scenes, relocates cameras, and updates memory with new observations. Evaluations on video depth estimation, camera pose reconstruction, and 3D mapping tasks demonstrate its state-of-the-art performance, while effectively supporting continuous scene understanding across extended sequences and multiple visits.