🤖 AI Summary
To address the trade-off between computational redundancy and accuracy in multi-frame point cloud 3D semantic segmentation for autonomous driving, this paper proposes an end-to-end trainable framework. First, a feature-level temporal aggregation mechanism is designed to efficiently fuse multi-frame information within the bird’s-eye-view (BEV) feature space. Second, feature regularization constraints are introduced to mitigate historical feature drift. Third, a lightweight MLP-based point decoder is adopted to avoid upsampling redundant historical points. The core contribution is the introduction of the first feature-level multi-frame aggregation regularization paradigm. Evaluated on nuScenes and Waymo Open Dataset, the method achieves state-of-the-art (SOTA) performance in mean Intersection-over-Union (mIoU), while improving inference speed by 23% and reducing GPU memory consumption by 37%.
📝 Abstract
We propose MFSeg, an efficient multi-frame 3D semantic segmentation framework. By aggregating point cloud sequences at the feature level and regularizing the feature extraction and aggregation process, MFSeg reduces computational overhead while maintaining high accuracy. Moreover, by employing a lightweight MLP-based point decoder, our method eliminates the need to upsample redundant points from past frames. Experiments on the nuScenes and Waymo datasets show that MFSeg outperforms existing methods, demonstrating its effectiveness and efficiency.