Latest Object Memory Management for Temporally Consistent Video Instance Segmentation

πŸ“… 2025-07-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address identity inconsistency caused by frequent object entrance and exit in long-video instance segmentation, this paper proposes a temporally consistent dynamic memory modeling framework. Methodologically, it introduces (1) a Latest Object Memory (LOM) mechanism that explicitly models object existence states and dynamically updates memory representations, and (2) a Decoupled Object Association (DOA) strategy that separates novelty detection from cross-frame matching to enhance identity stability. By combining divide-and-conquer association with robust index assignment, the framework significantly improves instance continuity and ID consistency over extended temporal sequences. Evaluated on YouTube-VIS 2022, it achieves a new state-of-the-art 54.0 APβ€”marking the first method to simultaneously attain high accuracy and strong temporal robustness in video instance segmentation.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we present Latest Object Memory Management (LOMM) for temporally consistent video instance segmentation that significantly improves long-term instance tracking. At the core of our method is Latest Object Memory (LOM), which robustly tracks and continuously updates the latest states of objects by explicitly modeling their presence in each frame. This enables consistent tracking and accurate identity management across frames, enhancing both performance and reliability through the VIS process. Moreover, we introduce Decoupled Object Association (DOA), a strategy that separately handles newly appearing and already existing objects. By leveraging our memory system, DOA accurately assigns object indices, improving matching accuracy and ensuring stable identity consistency, even in dynamic scenes where objects frequently appear and disappear. Extensive experiments and ablation studies demonstrate the superiority of our method over traditional approaches, setting a new benchmark in VIS. Notably, our LOMM achieves state-of-the-art AP score of 54.0 on YouTube-VIS 2022, a dataset known for its challenging long videos. Project page: https://seung-hun-lee.github.io/projects/LOMM/
Problem

Research questions and friction points this paper is trying to address.

Improving long-term instance tracking in video segmentation
Managing object identities consistently across frames
Handling dynamic scenes with frequent object appearances
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latest Object Memory (LOM) tracks and updates object states
Decoupled Object Association (DOA) handles new and existing objects
LOMM achieves state-of-the-art AP score on YouTube-VIS 2022
πŸ”Ž Similar Papers
2024-06-12Computer Vision and Pattern RecognitionCitations: 5