🤖 AI Summary
Existing approaches to fusing RGB and thermal infrared data often overlook cross-modal correspondences or employ shared representations that fail to account for the distinct physical characteristics of each modality, resulting in insufficient reconstruction robustness under complex illumination and adverse weather conditions. This work proposes a cross-modal FiLM modulation mechanism that leverages thermal structural priors to guide texture synthesis, alongside a modality-adaptive geometric disentanglement strategy that enables independent modeling of thermal radiance and visible-light geometry. By integrating an explicit spherical harmonics representation with an implicit neural decoder in a hybrid rendering pipeline, the proposed method achieves high-quality multimodal rendering on the RGBT-Scenes dataset, significantly outperforming current state-of-the-art approaches.
📝 Abstract
Multi-modal scene reconstruction integrating RGB and thermal infrared data is essential for robust environmental perception across diverse lighting and weather conditions. However, extending 3D Gaussian Splatting (3DGS) to multi-spectral scenarios remains challenging. Current approaches often struggle to fully leverage the complementary information of multi-modal data, typically relying on mechanisms that either tend to neglect cross-modal correlations or leverage shared representations that fail to adaptively handle the complex structural correlations and physical discrepancies between spectrums. To address these limitations, we propose ThermoSplat, a novel framework that enables deep spectral-aware reconstruction through active feature modulation and adaptive geometry decoupling. First, we introduce a Spectrum-Aware Adaptive Modulation that dynamically conditions shared latent features on thermal structural priors, effectively guiding visible texture synthesis with reliable cross-modal geometric cues. Second, to accommodate modality-specific geometric inconsistencies, we propose a Modality-Adaptive Geometric Decoupling scheme that learns independent opacity offsets and executes an independent rasterization pass for the thermal branch. Additionally, a hybrid rendering pipeline is employed to integrate explicit Spherical Harmonics with implicit neural decoding, ensuring both semantic consistency and high-frequency detail preservation. Extensive experiments on the RGBT-Scenes dataset demonstrate that ThermoSplat achieves state-of-the-art rendering quality across both visible and thermal spectrums.