Deep Common Feature Mining for Efficient Video Semantic Segmentation

📅 2024-03-05
🏛️ IEEE transactions on circuits and systems for video technology (Print)
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address high inter-frame computational redundancy and unreliable temporal feature propagation in video semantic segmentation, this paper proposes the Deep Common Feature Mining (DCFM) framework. Methodologically, DCFM introduces three key innovations: (1) a novel dual-path decoupled architecture that explicitly separates shared semantic features from frame-specific dynamic details; (2) a symmetric supervision strategy tailored for sparse frame-level annotations; and (3) a self-supervised contrastive loss to enhance intra-class consistency and temporal continuity. Evaluated on VSPW and Cityscapes, DCFM achieves state-of-the-art mIoU while significantly reducing computational cost (FLOPs ↓) and accelerating inference—demonstrating an optimal trade-off between accuracy and efficiency.

Technology Category

Application Category

📝 Abstract
Recent advancements in video semantic segmentation have made substantial progress by exploiting temporal correlations. Nevertheless, persistent challenges, including redundant computation and the reliability of the feature propagation process, underscore the need for further innovation. In response, we present Deep Common Feature Mining (DCFM), a novel approach strategically designed to address these challenges by leveraging the concept of feature sharing. DCFM explicitly decomposes features into two complementary components. The common representation extracted from a key-frame furnishes essential high-level information to neighboring non-key frames, allowing for direct re-utilization without feature propagation. Simultaneously, the independent feature, derived from each video frame, captures rapidly changing information, providing frame-specific clues crucial for segmentation. To achieve such decomposition, we employ a symmetric training strategy tailored for sparsely annotated data, empowering the backbone to learn a robust high-level representation enriched with common information. Additionally, we incorporate a self-supervised loss function to reinforce intra-class feature similarity and enhance temporal consistency. Experimental evaluations on the VSPW and Cityscapes datasets demonstrate the effectiveness of our method, showing a superior balance between accuracy and efficiency. The implementation is available at https://github.com/BUAAHugeGun/DCFM.
Problem

Research questions and friction points this paper is trying to address.

Video Segmentation
Redundant Computation
Reliability of Information Transmission
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep Universal Feature Mining (DCFM)
Efficient Video Segmentation
Consistency and Accuracy Enhancement
🔎 Similar Papers
No similar papers found.
Yaoyan Zheng
Yaoyan Zheng
BUAA
Computer VisionDeep LearningDigital Image Processing
H
Hongyu Yang
Institute of Artificial Intelligence, Beihang University, Beijing 100191, China, and also with Shanghai Artificial Intelligence Laboratory, Shanghai 201112, China
D
Di Huang
State Key Laboratory of Software Development Environment, School of Computer Science and Engineering, Beihang University, Beijing 100191, China, and also with the Zhejiang Industrial Big Data and Robot Intelligent System Key Laboratory, Hangzhou Innovation Institute, Beihang University, Hangzhou 310051, China