StixelNExT++: Lightweight Monocular Scene Segmentation and Representation for Collective Perception

📅 2025-07-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and low representation efficiency of monocular 3D scene understanding in autonomous driving collective perception, this paper proposes a lightweight monocular 3D scene representation method. Our approach integrates fine-grained 3D Stixel units with a learnable clustering mechanism, enabling semantic-aware adaptive clustering that compresses scene representations while improving object segmentation accuracy. We design a lightweight neural network that takes a single RGB image as input and jointly leverages depth estimation and LiDAR-based self-supervised ground truth to efficiently generate Stixel representations—natively supporting multimodal outputs including point clouds and bird’s-eye-view (BEV) maps. Evaluated on the Waymo Open Dataset within a 30-meter range, our method achieves state-of-the-art performance with only 10 ms inference time per frame, striking an optimal balance among real-time efficiency, accuracy, and compatibility with collaborative perception systems.

Technology Category

Application Category

📝 Abstract
This paper presents StixelNExT++, a novel approach to scene representation for monocular perception systems. Building on the established Stixel representation, our method infers 3D Stixels and enhances object segmentation by clustering smaller 3D Stixel units. The approach achieves high compression of scene information while remaining adaptable to point cloud and bird's-eye-view representations. Our lightweight neural network, trained on automatically generated LiDAR-based ground truth, achieves real-time performance with computation times as low as 10 ms per frame. Experimental results on the Waymo dataset demonstrate competitive performance within a 30-meter range, highlighting the potential of StixelNExT++ for collective perception in autonomous systems.
Problem

Research questions and friction points this paper is trying to address.

Enhances monocular scene segmentation via 3D Stixel clustering
Compresses scene data efficiently for real-time perception
Enables collective perception in autonomous systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight neural network for real-time processing
3D Stixel clustering enhances object segmentation
Adaptable to point cloud and bird's-eye-view
🔎 Similar Papers
No similar papers found.
M
Marcel Vosshans
Institute for Intelligent Systems at Esslingen University of Applied Sciences in Germany
O
Omar Ait-Aider
Institut Pascal ISPR (Image, Systems of Perception, Robotics), Universite Clermont Auvergne INP / CNRS, France
Youcef Mezouar
Youcef Mezouar
Professor at SIGMA'Clermont, Institut Pascal UMR 6602 CNRS/Université Clermont Auvergne/Sigma
robot visionvisual servoing
Markus Enzweiler
Markus Enzweiler
Professor of Computer Science, Esslingen University of Applied Sciences
Autonomous SystemsScene UnderstandingDeep LearningSelf-Driving