CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis

πŸ“… 2025-04-27
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Spectral imaging suffers from poor generalization and cross-device transferability of existing AI models due to substantial inter-camera variations in spectral channel count and wavelength response. To address this, we propose CARLβ€”a camera-agnostic representation learning framework enabling unified modeling of RGB, multispectral, and hyperspectral images for the first time. Our key innovations include: (1) wavelength positional encoding to explicitly incorporate spectral physical priors; (2) a query-based joint self- and cross-attention compression mechanism for efficient spectral-spatial information fusion; and (3) a JEPA-inspired spectral-spatial self-supervised pretraining paradigm. CARL achieves state-of-the-art performance on medical imaging, autonomous driving, and satellite remote sensing tasks. It demonstrates strong robustness under simulated and real-world cross-camera spectral variations and supports plug-and-play downstream adaptation. CARL establishes the first foundational spectral representation model with cross-modal and cross-camera generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Spectral imaging offers promising applications across diverse domains, including medicine and urban scene understanding, and is already established as a critical modality in remote sensing. However, variability in channel dimensionality and captured wavelengths among spectral cameras impede the development of AI-driven methodologies, leading to camera-specific models with limited generalizability and inadequate cross-camera applicability. To address this bottleneck, we introduce $ extbf{CARL}$, a model for $ extbf{C}$amera-$ extbf{A}$gnostic $ extbf{R}$epresentation $ extbf{L}$earning across RGB, multispectral, and hyperspectral imaging modalities. To enable the conversion of a spectral image with any channel dimensionality to a camera-agnostic embedding, we introduce wavelength positional encoding and a self-attention-cross-attention mechanism to compress spectral information into learned query representations. Spectral-spatial pre-training is achieved with a novel spectral self-supervised JEPA-inspired strategy tailored to CARL. Large-scale experiments across the domains of medical imaging, autonomous driving, and satellite imaging demonstrate our model's unique robustness to spectral heterogeneity, outperforming on datasets with simulated and real-world cross-camera spectral variations. The scalability and versatility of the proposed approach position our model as a backbone for future spectral foundation models.
Problem

Research questions and friction points this paper is trying to address.

Addressing spectral camera variability hindering AI model generalizability
Developing camera-agnostic representation learning for multi-modal spectral imaging
Overcoming cross-camera spectral heterogeneity in medical and remote sensing applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Camera-agnostic embedding via wavelength positional encoding
Self-attention-cross-attention for spectral compression
Spectral self-supervised JEPA-inspired pre-training strategy
πŸ”Ž Similar Papers
No similar papers found.
A
Alexander Baumann
Siemens AG, Munich; Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ) Heidelberg
L
Leonardo Ayala
Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ) Heidelberg
S
Silvia Seidlitz
Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ) Heidelberg; Faculty of Mathematics and Computer Science, Heidelberg University
J
Jan Sellner
Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ) Heidelberg; National Center for Tumor Diseases (NCT), NCT Heidelberg; HIDSS4Health, Heidelberg
A
Alexander Studier-Fischer
Department of Urology and Urosurgery, University Medical Center Mannheim; Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital; Division of Intelligent Systems and Robotics in Urology, DKFZ Heidelberg; DKFZ Hector Cancer Institute, University Medical Center Mannheim
B
Berkin Ozdemir
Department of General, Visceral, and Transplantation Surgery, Heidelberg University Hospital; Division of Intelligent Systems and Robotics in Urology, DKFZ Heidelberg; DKFZ Hector Cancer Institute, University Medical Center Mannheim
L
Lena Maier-Hein
Division of Intelligent Medical Systems, German Cancer Research Center (DKFZ) Heidelberg; Medical Faculty and Faculty of Mathematics and Computer Science, Heidelberg University; National Center for Tumor Diseases (NCT), NCT Heidelberg; HIDSS4Health, Heidelberg
Slobodan Ilic
Slobodan Ilic
Senior Key Expert Research Scientist, Siemens AG and Adjunct Professor at TUM
Computer Vision