NAIMA: Semantics Aware RGB Guided Depth Super-Resolution

📅 2026-04-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the boundary blurring and artifacts in RGB-guided depth super-resolution caused by misleading color and texture cues. To this end, it introduces— for the first time—the global semantic priors from the DINOv2 pre-trained vision transformer and proposes a Guided Token Attention module that iteratively aligns RGB and depth features through a cross-modal attention mechanism. Furthermore, a multi-level semantic context selective injection strategy is designed to enable semantic-aware depth map reconstruction. The proposed method consistently outperforms existing approaches across multiple datasets and scale factors, significantly enhancing the sharpness of depth boundaries and the recovery of structural details.
📝 Abstract
Guided depth super-resolution (GDSR) is a multi-modal approach for depth map super-resolution that relies on a low-resolution depth map and a high-resolution RGB image to restore finer structural details. However, the misleading color and texture cues indicating depth discontinuities in RGB images often lead to artifacts and blurred depth boundaries in the generated depth map. We propose a solution that introduces global contextual semantic priors, generated from pretrained vision transformer token embeddings. Our approach to distilling semantic knowledge from pretrained token embeddings is motivated by their demonstrated effectiveness in related monocular depth estimation tasks. We introduce a Guided Token Attention (GTA) module, which iteratively aligns encoded RGB spatial features with depth encodings, using cross-attention for selectively injecting global semantic context extracted from different layers of a pretrained vision transformer. Additionally, we present an architecture called Neural Attention for Implicit Multi-token Alignment (NAIMA), which integrates DINOv2 with GTA blocks for a semantics-aware GDSR. Our proposed architecture, with its ability to distill semantic knowledge, achieves significant improvements over existing methods across multiple scaling factors and datasets.
Problem

Research questions and friction points this paper is trying to address.

guided depth super-resolution
depth map
RGB guidance
semantic priors
depth boundaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

guided depth super-resolution
semantic priors
vision transformer
cross-attention
token alignment
🔎 Similar Papers
No similar papers found.
T
Tayyab Nasir
The University of Western Australia
Daochang Liu
Daochang Liu
Lecturer, University of Western Australia
Computer VisionGenerative AIHuman Action UnderstandingHealthcare Data Science
A
Ajmal Mian
The University of Western Australia