🤖 AI Summary
To address the scarcity of labeled point cloud semantic segmentation data for unpaved road scenes—only 50 annotated samples—the paper proposes a two-stage data-efficient training framework: cross-dataset pretraining followed by target-domain fine-tuning. Methodologically, it integrates a novel Point Prompt mechanism into batch normalization layers, jointly with manifold mixup augmentation and histogram-normalized environmental feature modeling to enhance domain adaptation. Built upon a projection-based CNN architecture, the approach incorporates a lightweight prediction head for fine-tuning and applies Manifold Mixup regularization. Experiments on eight unmodified road scene classes demonstrate an average Intersection-over-Union (mIoU) of 51.8% (+18.3% over direct training baseline) and an overall accuracy of 90.8%, significantly outperforming conventional baselines. The results validate the framework’s robustness and generalizability for semantic segmentation under extreme few-shot conditions.
📝 Abstract
In this case study, we present a data-efficient point cloud segmentation pipeline and training framework for robust segmentation of unimproved roads and seven other classes. Our method employs a two-stage training framework: first, a projection-based convolutional neural network is pre-trained on a mixture of public urban datasets and a small, curated in-domain dataset; then, a lightweight prediction head is fine-tuned exclusively on in-domain data. Along the way, we explore the application of Point Prompt Training to batch normalization layers and the effects of Manifold Mixup as a regularizer within our pipeline. We also explore the effects of incorporating histogram-normalized ambients to further boost performance. Using only 50 labeled point clouds from our target domain, we show that our proposed training approach improves mean Intersection-over-Union from 33.5% to 51.8% and the overall accuracy from 85.5% to 90.8%, when compared to naive training on the in-domain data. Crucially, our results demonstrate that pre-training across multiple datasets is key to improving generalization and enabling robust segmentation under limited in-domain supervision. Overall, this study demonstrates a practical framework for robust 3D semantic segmentation in challenging, low-data scenarios. Our code is available at: https://github.com/andrewyarovoi/MD-FRNet.