StixelNExT++: Lightweight Monocular Scene Segmentation and Representation for Collective Perception

📅 2025-07-09

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

To address the high computational cost and low representation efficiency of monocular 3D scene understanding in autonomous driving collective perception, this paper proposes a lightweight monocular 3D scene representation method. Our approach integrates fine-grained 3D Stixel units with a learnable clustering mechanism, enabling semantic-aware adaptive clustering that compresses scene representations while improving object segmentation accuracy. We design a lightweight neural network that takes a single RGB image as input and jointly leverages depth estimation and LiDAR-based self-supervised ground truth to efficiently generate Stixel representations—natively supporting multimodal outputs including point clouds and bird’s-eye-view (BEV) maps. Evaluated on the Waymo Open Dataset within a 30-meter range, our method achieves state-of-the-art performance with only 10 ms inference time per frame, striking an optimal balance among real-time efficiency, accuracy, and compatibility with collaborative perception systems.

Technology Category

Application Category

📝 Abstract

This paper presents StixelNExT++, a novel approach to scene representation for monocular perception systems. Building on the established Stixel representation, our method infers 3D Stixels and enhances object segmentation by clustering smaller 3D Stixel units. The approach achieves high compression of scene information while remaining adaptable to point cloud and bird's-eye-view representations. Our lightweight neural network, trained on automatically generated LiDAR-based ground truth, achieves real-time performance with computation times as low as 10 ms per frame. Experimental results on the Waymo dataset demonstrate competitive performance within a 30-meter range, highlighting the potential of StixelNExT++ for collective perception in autonomous systems.

Problem

Research questions and friction points this paper is trying to address.

Enhances monocular scene segmentation via 3D Stixel clustering

Compresses scene data efficiently for real-time perception

Enables collective perception in autonomous systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight neural network for real-time processing

3D Stixel clustering enhances object segmentation

Adaptable to point cloud and bird's-eye-view

🔎 Similar Papers

StixelNExT: Toward Monocular Low-Weight Perception for Object Segmentation and Free Space Detection