AnyThermal: Towards Learning Universal Representations for Thermal Perception

📅 2026-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing thermal imaging backbone networks exhibit limited generalization due to their reliance on small-scale, task-specific training. This work proposes the first task-agnostic universal representation for thermal imaging by transferring semantic features from the vision foundation model DINOv2 to a thermal encoder via knowledge distillation, trained on TartanRGBT—a novel, self-collected multi-environment synchronized RGB-thermal dataset. The resulting representation enables diverse downstream tasks—including cross-modal place recognition, thermal image segmentation, and monocular depth estimation—without task-specific fine-tuning. It achieves state-of-the-art performance across multiple benchmarks, with improvements of up to 36%. The authors also open-source both the TartanRGBT data collection platform and the dataset to support future research in thermal vision.

Technology Category

Application Category

📝 Abstract
We present AnyThermal, a thermal backbone that captures robust task-agnostic thermal features suitable for a variety of tasks such as cross-modal place recognition, thermal segmentation, and monocular depth estimation using thermal images. Existing thermal backbones that follow task-specific training from small-scale data result in utility limited to a specific environment and task. Unlike prior methods, AnyThermal can be used for a wide range of environments (indoor, aerial, off-road, urban) and tasks, all without task-specific training. Our key insight is to distill the feature representations from visual foundation models such as DINOv2 into a thermal encoder using thermal data from these multiple environments. To bridge the diversity gap of the existing RGB-Thermal datasets, we introduce the TartanRGBT platform, the first open-source data collection platform with synced RGB-Thermal image acquisition. We use this payload to collect the TartanRGBT dataset - a diverse and balanced dataset collected in 4 environments. We demonstrate the efficacy of AnyThermal and TartanRGBT, achieving state-of-the-art results with improvements of up to 36% across diverse environments and downstream tasks on existing datasets.
Problem

Research questions and friction points this paper is trying to address.

thermal perception
universal representation
task-agnostic
cross-modal
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

thermal representation learning
foundation model distillation
cross-modal perception
RGB-Thermal dataset
task-agnostic backbone
🔎 Similar Papers
No similar papers found.
Parv Maheshwari
Parv Maheshwari
Student, Carnegie Mellon University
RoboticsMotion ControlPath Planning
Jay Karhade
Jay Karhade
Carnegie Mellon University
RoboticsComputer VisionPerception
Y
Yogesh Chawla
Biological Systems Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
I
Isaiah Adu
Mechanical Engineering, Penn State University, University Park, PA, USA
F
Florian Heisen
School of Engineering and Design, Technical University of Munich, Munich, Germany
A
Andrew Porco
Mechanical Engineering, Carnegie Mellon University, Pittsburgh, PA, USA
A
Andrew Jong
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Y
Yifei Liu
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, USA
Santosh Pitla
Santosh Pitla
University of Nebraska-Lincoln
Machine Automation and Agricultural Robotics
Sebastian Scherer
Sebastian Scherer
Associate Research Professor, Carnegie Mellon University
RoboticsUASobstacle avoidanceperceptionplanning
Wenshan Wang
Wenshan Wang
Carnegie Mellon University
RoboticsMachine LearningArtificial Intelligence