Reconstruction Using the Invisible: Intuition from NIR and Metadata for Enhanced 3D Gaussian Splatting

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor robustness and semantic deficiency in 3D reconstruction under agricultural conditions—characterized by non-uniform illumination, severe occlusion, and limited field-of-view—this paper proposes NIRSplat, the first multimodal 3D Gaussian Splatting framework integrating near-infrared (NIR) imagery with textual metadata of vegetation indices (NDVI, NDWI, chlorophyll index). Methodologically, we introduce NIRPlant, the first multimodal agricultural dataset comprising synchronized NIR/RGB images, LiDAR point clouds, and structured textual vegetation metadata; design a cross-modal fusion architecture leveraging cross-attention mechanisms and 3D positional encoding; and incorporate geometric priors to regularize optimization. Experiments demonstrate that NIRSplat significantly outperforms 3DGS, CoR-GS, and InstantSplat in reconstruction completeness and geometric accuracy, while enhancing semantic interpretability. The code and NIRPlant dataset are publicly released.

Technology Category

Application Category

📝 Abstract
While 3D Gaussian Splatting (3DGS) has rapidly advanced, its application in agriculture remains underexplored. Agricultural scenes present unique challenges for 3D reconstruction methods, particularly due to uneven illumination, occlusions, and a limited field of view. To address these limitations, we introduce extbf{NIRPlant}, a novel multimodal dataset encompassing Near-Infrared (NIR) imagery, RGB imagery, textual metadata, Depth, and LiDAR data collected under varied indoor and outdoor lighting conditions. By integrating NIR data, our approach enhances robustness and provides crucial botanical insights that extend beyond the visible spectrum. Additionally, we leverage text-based metadata derived from vegetation indices, such as NDVI, NDWI, and the chlorophyll index, which significantly enriches the contextual understanding of complex agricultural environments. To fully exploit these modalities, we propose extbf{NIRSplat}, an effective multimodal Gaussian splatting architecture employing a cross-attention mechanism combined with 3D point-based positional encoding, providing robust geometric priors. Comprehensive experiments demonstrate that extbf{NIRSplat} outperforms existing landmark methods, including 3DGS, CoR-GS, and InstantSplat, highlighting its effectiveness in challenging agricultural scenarios. The code and dataset are publicly available at: https://github.com/StructuresComp/3D-Reconstruction-NIR
Problem

Research questions and friction points this paper is trying to address.

Enhancing 3D reconstruction in challenging agricultural environments
Addressing limitations from uneven illumination and occlusions
Integrating NIR imagery and metadata for botanical insights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses NIR data for robustness and botanical insights
Integrates text metadata from vegetation indices
Employs cross-attention with positional encoding architecture
🔎 Similar Papers
No similar papers found.
G
Gyusam Chang
Korea University, University of California, Los Angeles
T
Tuan-Anh Vu
University of California, Los Angeles
V
Vivek Alumootil
University of California, Los Angeles
H
Harris Song
University of California, Los Angeles
D
Deanna Pham
University of California, Los Angeles
Sangpil Kim
Sangpil Kim
Korea University
Computer Vision
M. Khalid Jawed
M. Khalid Jawed
UCLA (Structures-Computer Interaction Lab)
Solid and structural mechanicsroboticsphysics-assisted machine learning