DuGI-MAE: Improving Infrared Mask Autoencoders via Dual-Domain Guidance

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing infrared foundation models (e.g., InfMAE) suffer from high-entropy informative token loss due to stochastic masking, insufficient global relational modeling, and poor robustness to non-uniform background noise. To address these limitations, we propose the Dual-Domain Guided Masked Autoencoder (Dual-Domain Guided MAE), which introduces an entropy-driven deterministic masking strategy to explicitly preserve high-informativeness tokens. Additionally, a novel dual-domain guidance module is designed to jointly model cross-domain global dependencies and adaptively suppress non-uniform background noise. The model is pre-trained on our large-scale, self-collected infrared dataset, Inf-590K. Extensive experiments demonstrate that Dual-Domain Guided MAE significantly outperforms state-of-the-art supervised and self-supervised methods on downstream infrared vision tasks—including infrared object detection, semantic segmentation, and small-object detection—validating its superior capability in capturing intrinsic structural priors and noise characteristics of infrared imagery.

Technology Category

Application Category

📝 Abstract
Infrared imaging plays a critical role in low-light and adverse weather conditions. However, due to the distinct characteristics of infrared images, existing foundation models such as Masked Autoencoder (MAE) trained on visible data perform suboptimal in infrared image interpretation tasks. To bridge this gap, an infrared foundation model known as InfMAE was developed and pre-trained on large-scale infrared datasets. Despite its effectiveness, InfMAE still faces several limitations, including the omission of informative tokens, insufficient modeling of global associations, and neglect of non-uniform noise. In this paper, we propose a Dual-domain Guided Infrared foundation model based on MAE (DuGI-MAE). First, we design a deterministic masking strategy based on token entropy, preserving only high-entropy tokens for reconstruction to enhance informativeness. Next, we introduce a Dual-Domain Guidance (DDG) module, which simultaneously captures global token relationships and adaptively filters non-uniform background noise commonly present in infrared imagery. To facilitate large-scale pretraining, we construct Inf-590K, a comprehensive infrared image dataset encompassing diverse scenes, various target types, and multiple spatial resolutions. Pretrained on Inf-590K, DuGI-MAE demonstrates strong generalization capabilities across various downstream tasks, including infrared object detection, semantic segmentation, and small target detection. Experimental results validate the superiority of the proposed method over both supervised and self-supervised comparison methods. Our code is available in the supplementary material.
Problem

Research questions and friction points this paper is trying to address.

Improves infrared image interpretation via dual-domain guidance
Addresses token omission and global association modeling in infrared MAE
Mitigates non-uniform noise in infrared imagery for better analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deterministic masking based on token entropy
Dual-domain guidance for global relationships and noise
Large-scale pretraining on diverse infrared dataset
🔎 Similar Papers
No similar papers found.
Y
Yinghui Xing
School of Computer Science, Northwestern Polytechnical University, China
X
Xiaoting Su
School of Computer Science, Northwestern Polytechnical University, China
Shizhou Zhang
Shizhou Zhang
Northwestern Polytechnical University
computer visionmachine learning
D
Donghao Chu
School of Computer Science, Northwestern Polytechnical University, China
Di Xu
Di Xu
Professor
Economics of EducationHigher Education PolicyProgram EvaluationCommunity CollegesOnline