ThermalGen: Style-Disentangled Flow-Based Generative Models for RGB-to-Thermal Image Translation

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The scarcity of synchronized RGB-thermal image pairs severely hinders cross-modal perception tasks such as alignment and retrieval. To address this, we propose the first flow-based style-disentangled generative model for controllable, high-fidelity thermal image synthesis from a single RGB input. Our method integrates RGB content via a learnable conditional guidance mechanism and explicitly disentangles style factors—including viewpoint, sensor response, and environmental conditions—to enhance realism and generalization of synthesized thermal images. Trained on a large-scale, self-collected paired dataset, our approach outperforms or matches state-of-the-art GANs and diffusion models across multiple benchmarks. It achieves significant improvements in cross-modal image alignment and retrieval, notably boosting mean average precision (mAP) by +8.2%. This work establishes a novel paradigm for low-resource cross-modal learning, enabling robust thermal synthesis without requiring extensive paired supervision.

Technology Category

Application Category

📝 Abstract
Paired RGB-thermal data is crucial for visual-thermal sensor fusion and cross-modality tasks, including important applications such as multi-modal image alignment and retrieval. However, the scarcity of synchronized and calibrated RGB-thermal image pairs presents a major obstacle to progress in these areas. To overcome this challenge, RGB-to-Thermal (RGB-T) image translation has emerged as a promising solution, enabling the synthesis of thermal images from abundant RGB datasets for training purposes. In this study, we propose ThermalGen, an adaptive flow-based generative model for RGB-T image translation, incorporating an RGB image conditioning architecture and a style-disentangled mechanism. To support large-scale training, we curated eight public satellite-aerial, aerial, and ground RGB-T paired datasets, and introduced three new large-scale satellite-aerial RGB-T datasets--DJI-day, Bosonplus-day, and Bosonplus-night--captured across diverse times, sensor types, and geographic regions. Extensive evaluations across multiple RGB-T benchmarks demonstrate that ThermalGen achieves comparable or superior translation performance compared to existing GAN-based and diffusion-based methods. To our knowledge, ThermalGen is the first RGB-T image translation model capable of synthesizing thermal images that reflect significant variations in viewpoints, sensor characteristics, and environmental conditions. Project page: http://xjh19971.github.io/ThermalGen
Problem

Research questions and friction points this paper is trying to address.

Generating thermal images from RGB data for cross-modality applications
Overcoming scarcity of calibrated RGB-thermal image pairs for training
Translating RGB to thermal across varying viewpoints and conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Flow-based generative model for RGB-thermal translation
Style-disentangled mechanism with RGB conditioning architecture
Synthesizes thermal images across diverse sensor conditions
🔎 Similar Papers
No similar papers found.
J
Jiuhong Xiao
New York University
Roshan Nayak
Roshan Nayak
RND Engineer @Synopsys Inc
Deep LearningNLP
N
Ning Zhang
Technology Innovation Institute
D
Daniel Tortei
Technology Innovation Institute
Giuseppe Loianno
Giuseppe Loianno
UC Berkeley
RoboticsMAVsVisionSensor Fusion