Multi-Attribute guided Thermal Face Image Translation based on Latent Diffusion Model

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address distortion and identity feature degradation caused by cross-modal translation in heterogeneous infrared–visible face recognition (HFR), this paper proposes a multi-attribute-guided latent diffusion model. Methodologically, we introduce a novel Self-Attention Mamba (Self-attn Mamba) module to enhance long-range cross-modal dependency modeling while improving inference efficiency; additionally, a multi-attribute classifier is integrated to enforce fine-grained semantic constraints, ensuring high-fidelity image generation and robust identity preservation. Extensive experiments on two mainstream benchmarks demonstrate that our approach achieves state-of-the-art performance across key metrics—including LPIPS, FID, and recognition accuracy—outperforming existing HFR methods by significant margins. The proposed framework establishes a new paradigm for robust face recognition in low-light and nighttime surveillance scenarios.

Technology Category

Application Category

📝 Abstract
Modern surveillance systems increasingly rely on multi-wavelength sensors and deep neural networks to recognize faces in infrared images captured at night. However, most facial recognition models are trained on visible light datasets, leading to substantial performance degradation on infrared inputs due to significant domain shifts. Early feature-based methods for infrared face recognition proved ineffective, prompting researchers to adopt generative approaches that convert infrared images into visible light images for improved recognition. This paradigm, known as Heterogeneous Face Recognition (HFR), faces challenges such as model and modality discrepancies, leading to distortion and feature loss in generated images. To address these limitations, this paper introduces a novel latent diffusion-based model designed to generate high-quality visible face images from thermal inputs while preserving critical identity features. A multi-attribute classifier is incorporated to extract key facial attributes from visible images, mitigating feature loss during infrared-to-visible image restoration. Additionally, we propose the Self-attn Mamba module, which enhances global modeling of cross-modal features and significantly improves inference speed. Experimental results on two benchmark datasets demonstrate the superiority of our approach, achieving state-of-the-art performance in both image quality and identity preservation.
Problem

Research questions and friction points this paper is trying to address.

Convert thermal face images to visible light for recognition
Preserve identity features during infrared-to-visible image translation
Enhance cross-modal feature modeling and inference speed
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion model for thermal-to-visible face translation
Multi-attribute classifier to preserve identity features
Self-attn Mamba module for cross-modal feature modeling
🔎 Similar Papers
No similar papers found.
M
Mingshu Cai
Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, Fukuoka, Japan
Osamu Yoshie
Osamu Yoshie
waseda university
Y
Yuya Ieiri
Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, Fukuoka, Japan