🤖 AI Summary
This work addresses the challenge of efficiently constructing 3D radiance fields in thermal infrared scenes without relying on visible-light or multimodal data. It proposes Thermal-to-Depth Gaussian Splatting (TDg), a novel method that, for the first time, enables high-quality 3D reconstruction using only monomodal thermal inputs. TDg leverages monocular depth estimation to guide geometric modeling within the 3D Gaussian splatting framework, achieving competitive rendering fidelity while substantially reducing computational overhead. Experimental results on the RGBT-Scenes and ThermalMix datasets demonstrate that TDg outperforms the MSMG baseline across key perceptual and photometric metrics—LPIPS, SSIM, and PSNR—and reduces training time by 55% (equivalent to 12 minutes and 47 seconds), thereby validating its efficiency and practicality for thermal-only 3D scene reconstruction.
📝 Abstract
Efficient and robust 3D scene representation is crucial in autonomous driving, robotics, and related fields. While RGB images provide valuable content for 3D reconstruction, other modalities like thermal or depth can enable additional information on the environment. Lately, novel view synthesis methods like 3D Gaussian Splatting have started using multiple modalities to further boost their performance. But fusing or combining multimodal data can make the process slower and can bring in additional challenges. Therefore, our project aims to use single modality based on thermal infrared domain, by removing the reliance on visible light as much as possible. This single modality can be expected to be faster as it does not rely on multimodal data. We propose a method, Thermal-to-Depth Gaussian Splatting (TDg), that uses only thermal images and depth estimation in its architecture to derive the radiance fields. Our TDg method outperforms the MSMG (Multiple Single-Modal Gaussians) baseline in most cases on our test datasets, RGBT-Scenes and ThermalMix. On average, the rendering quality metrics such as learned perceptual image patch similarity (LPIPS), structural similarity index measure (SSIM), and peak signal-to-noise ratio (PSNR) of TDg are 1.12%, 0.034%, and 0.01% better than the baseline MSMG values. It also reduces the training time significantly, by 12 mins 47 secs (55% improvement). Overall, our method is successful in deriving these thermal radiance fields, which can ultimately have several applications, such as identifying heat sources critical in surveillance, search or rescue operations, and industrial inspections where temperature is widely used to monitor machines.