UNIV: Unified Foundation Model for Infrared and Visible Modalities

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB–visible and infrared pre-trained models achieve strong performance on single-modal tasks but exhibit limited generalization in multimodal collaborative perception (e.g., autonomous driving under adverse weather). To address this, we propose a unified vision foundation model for cross-modal joint understanding. Our method introduces a patch-level cross-modal contrastive learning mechanism coupled with a dual-knowledge retention architecture—inspired by retinal and bipolar cell functionality—to enable feature alignment while mitigating catastrophic forgetting. We further incorporate attention-guided contrastive distillation, LoRA adapters, and synchronized distillation to ensure compatibility across diverse Transformer architectures. Additionally, we construct the large-scale, precisely aligned MVIP dataset. Experiments demonstrate improvements of +1.7 mIoU on thermal semantic segmentation and +0.7 mAP on thermal object detection, while preserving over 99% of baseline performance on visible-light tasks.

Technology Category

Application Category

📝 Abstract
The demand for joint RGB-visible and infrared perception is growing rapidly, particularly to achieve robust performance under diverse weather conditions. Although pre-trained models for RGB-visible and infrared data excel in their respective domains, they often underperform in multimodal scenarios, such as autonomous vehicles equipped with both sensors. To address this challenge, we propose a biologically inspired UNified foundation model for Infrared and Visible modalities (UNIV), featuring two key innovations. First, we introduce Patch-wise Cross-modality Contrastive Learning (PCCL), an attention-guided distillation framework that mimics retinal horizontal cells' lateral inhibition, which enables effective cross-modal feature alignment while remaining compatible with any transformer-based architecture. Second, our dual-knowledge preservation mechanism emulates the retina's bipolar cell signal routing - combining LoRA adapters (2% added parameters) with synchronous distillation to prevent catastrophic forgetting, thereby replicating the retina's photopic (cone-driven) and scotopic (rod-driven) functionality. To support cross-modal learning, we introduce the MVIP dataset, the most comprehensive visible-infrared benchmark to date. It contains 98,992 precisely aligned image pairs spanning diverse scenarios. Extensive experiments demonstrate UNIV's superior performance on infrared tasks (+1.7 mIoU in semantic segmentation and +0.7 mAP in object detection) while maintaining 99%+ of the baseline performance on visible RGB tasks. Our code is available at https://github.com/fangyuanmao/UNIV.
Problem

Research questions and friction points this paper is trying to address.

Improve cross-modal performance for RGB and infrared sensors
Address multimodal underperformance in autonomous vehicle perception
Enable robust perception under diverse weather conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Patch-wise Cross-modality Contrastive Learning
Dual-knowledge preservation mechanism
Transformer-based architecture with LoRA adapters
🔎 Similar Papers
No similar papers found.
Fangyuan Mao
Fangyuan Mao
Student at Institute of Computing Technology
S
Shuo Wang
Research Center for Intelligent Computing Systems, CAS ICT; University of Chinese Academy of Sciences
Jilin Mei
Jilin Mei
Research Center for Intelligent Computing Systems, Institute of Computing Technology, University of Chinese Academy of Sciences
autonomous driving
C
Chen Min
Research Center for Intelligent Computing Systems, CAS ICT; University of Chinese Academy of Sciences
Shun Lu
Shun Lu
Institute of Computing Technology, Chinese Academy of Sciences
Neural Architecture Search
Fuyang Liu
Fuyang Liu
University of Chinese Academy of Sciences
Deep Learning
Y
Yu Hu
Research Center for Intelligent Computing Systems, CAS ICT; University of Chinese Academy of Sciences