🤖 AI Summary
This work addresses the challenge of maintaining semantic instance segmentation and identity consistency in sparsely sampled indoor 3D scans, where objects may move, appear, or disappear over time. The authors propose ReScene4D, the first method to formally define the task of sparse 4D indoor semantic instance segmentation. Building upon the 3DSIS architecture, ReScene4D extends it into the temporal domain through a cross-frame context sharing mechanism that jointly optimizes semantic segmentation and instance association, enabling temporally consistent instance tracking without requiring high-frequency observations or conventional discrete matching. The study introduces the t-mAP evaluation metric and achieves state-of-the-art performance on 3RScan, significantly improving both temporal consistency and standard 3D segmentation quality, thereby establishing a new benchmark for understanding dynamic indoor scenes.
📝 Abstract
Indoor environments evolve as objects move, appear, or disappear. Capturing these dynamics requires maintaining temporally consistent instance identities across intermittently captured 3D scans, even when changes are unobserved. We introduce and formalize the task of temporally sparse 4D indoor semantic instance segmentation (SIS), which jointly segments, identifies, and temporally associates object instances. This setting poses a challenge for existing 3DSIS methods, which require a discrete matching step due to their lack of temporal reasoning, and for 4D LiDAR approaches, which perform poorly due to their reliance on high-frequency temporal measurements that are uncommon in the longer-horizon evolution of indoor environments. We propose ReScene4D, a novel method that adapts 3DSIS architectures for 4DSIS without needing dense observations. It explores strategies to share information across observations, demonstrating that this shared context not only enables consistent instance tracking but also improves standard 3DSIS quality. To evaluate this task, we define a new metric, t-mAP, that extends mAP to reward temporal identity consistency. ReScene4D achieves state-of-the-art performance on the 3RScan dataset, establishing a new benchmark for understanding evolving indoor scenes.