Road Obstacle Video Segmentation

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Existing road obstacle segmentation methods predominantly operate on single-frame images, neglecting temporal consistency and thereby suffering from severe prediction flickering in long videos. This work first systematically establishes the video-level temporal nature of this task and constructs four novel long-sequence video benchmarks to comprehensively evaluate the temporal robustness of both image- and video-based segmentation methods. Leveraging vision foundation models, we propose two strong baselines that explicitly model inter-frame consistency via spatiotemporal feature fusion. Our approaches achieve state-of-the-art performance across multiple video segmentation benchmarks, significantly improving both segmentation accuracy and temporal stability. We release the first dedicated video segmentation benchmark for road obstacles—ROAD-Video—thereby establishing a new paradigm for temporal modeling in this domain. This work provides a standardized evaluation framework and a principled technical pathway for video semantic segmentation in autonomous driving scenarios.

Technology Category

Application Category

📝 Abstract

With the growing deployment of autonomous driving agents, the detection and segmentation of road obstacles have become critical to ensure safe autonomous navigation. However, existing road-obstacle segmentation methods are applied on individual frames, overlooking the temporal nature of the problem, leading to inconsistent prediction maps between consecutive frames. In this work, we demonstrate that the road-obstacle segmentation task is inherently temporal, since the segmentation maps for consecutive frames are strongly correlated. To address this, we curate and adapt four evaluation benchmarks for road-obstacle video segmentation and evaluate 11 state-of-the-art image- and video-based segmentation methods on these benchmarks. Moreover, we introduce two strong baseline methods based on vision foundation models. Our approach establishes a new state-of-the-art in road-obstacle video segmentation for long-range video sequences, providing valuable insights and direction for future research.

Problem

Research questions and friction points this paper is trying to address.

Addressing inconsistent road obstacle segmentation across video frames

Evaluating temporal methods for road obstacle video segmentation

Establishing benchmarks for video-based obstacle detection consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video-based segmentation for temporal consistency

Vision foundation models for baseline methods

Long-range video sequences for obstacle segmentation

🔎 Similar Papers

No similar papers found.