π€ AI Summary
This work addresses the absence of publicly available multimodal datasets that simultaneously incorporate high-density LiDAR, high-resolution oblique imagery, and fine-grained 3D semantic annotations for power line infrastructure. To bridge this gap, we introduce GridNet-HD, a novel benchmark dataset comprising 250 million LiDAR points and 7,694 oblique images, annotated with 11 semantic classes in 3Dβthe first of its kind to be released. Leveraging this dataset, we develop both single-modality and multimodal fusion baselines for 3D semantic segmentation, demonstrating the complementary nature of geometric and appearance cues. Experimental results show that the multimodal model achieves a 5.55% improvement in mean Intersection over Union (mIoU) over the best single-modality baseline, significantly enhancing 3D semantic understanding of power line assets.
π Abstract
This paper presents GridNet-HD, a multi-modal dataset for 3D semantic segmentation of overhead electrical infrastructures, pairing high-density LiDAR with high-resolution oblique imagery. The dataset comprises 7,694 images and 2.5 billion points annotated into 11 classes, with predefined splits and mIoU metrics. Unimodal (LiDAR-only, image-only) and multi-modal fusion baselines are provided. On GridNet-HD, fusion models outperform the best unimodal baseline by +5.55 mIoU, highlighting the complementarity of geometry and appearance. As reviewed in Sec. 2, no public dataset jointly provides high-density LiDAR and high-resolution oblique imagery with 3D semantic labels for power-line assets. Dataset, baselines, and codes are available: https://huggingface.co/collections/heig-vd-geo/gridnet-hd.