🤖 AI Summary
To address the weak generalization of implicit neural models across heterogeneous imaging modalities (RGB, monochrome, near-infrared, polarization, and multispectral) and the scarcity of training data, this paper introduces MMS-DATA—the first multi-view, multi-modal dataset covering all five modalities—and proposes MMS-FW, a modular, scalable multi-modal Neural Radiance Fields (NeRF) framework. MMS-FW is the first to jointly model raw multi-channel sensor data and enable cross-modal information transfer, supporting unified volumetric rendering for arbitrary modality combinations. It innovatively incorporates a cross-modal feature alignment mechanism and plug-and-play modality-specific encoders, overcoming the limitations of single-modal NeRFs. Extensive evaluation on 32 real-world scenes demonstrates that MMS-FW achieves significantly superior cross-modal rendering quality compared to single-modal baselines. The code, pretrained models, and the complete MMS-DATA dataset are fully open-sourced to advance research in multi-modal 3D perception and reconstruction.
📝 Abstract
Neural Radiance Fields (NeRF) have shown impressive performances in the rendering of 3D scenes from arbitrary viewpoints. While RGB images are widely preferred for training volume rendering models, the interest in other radiance modalities is also growing. However, the capability of the underlying implicit neural models to learn and transfer information across heterogeneous imaging modalities has seldom been explored, mostly due to the limited training data availability. For this purpose, we present MultimodalStudio (MMS): it encompasses MMS-DATA and MMS-FW. MMS-DATA is a multimodal multi-view dataset containing 32 scenes acquired with 5 different imaging modalities: RGB, monochrome, near-infrared, polarization and multispectral. MMS-FW is a novel modular multimodal NeRF framework designed to handle multimodal raw data and able to support an arbitrary number of multi-channel devices. Through extensive experiments, we demonstrate that MMS-FW trained on MMS-DATA can transfer information between different imaging modalities and produce higher quality renderings than using single modalities alone. We publicly release the dataset and the framework, to promote the research on multimodal volume rendering and beyond.