Weighted Reverse Convolution for Feature Upsampling

📅 2026-05-17
📈 Citations: 0
Influential: 0
📄 PDF

career value

223K/year
🤖 AI Summary
Existing pretrained vision foundation models produce feature maps with limited spatial resolution, hindering their performance on fine-grained localization and dense prediction tasks. This work formulates feature upsampling as an inverse problem and introduces Weighted Regularized Convolution (WRC), a novel approach that leverages a spatially adaptive, weighted Tikhonov-regularized least-squares framework to reconstruct high-resolution features while preserving structural details and avoiding oversmoothing. WRC incorporates learnable, spatially varying weights to balance data fidelity and regularization strength, and derives a differentiable closed-form solution via the fast Fourier transform (FFT) for efficient dense feature reconstruction. Experiments demonstrate that WRC significantly enhances dense feature quality across diverse tasks—including semantic segmentation, depth estimation, video object segmentation, object discovery, and keypoint matching—while maintaining high computational efficiency.
📝 Abstract
Pre-trained vision foundation models (VFMs) provide strong semantic representations, yet their patch-level features are inherently coarse, limiting their effectiveness on tasks requiring fine-grained localization, dense prediction, and point-wise correspondence. In this work, we revisit feature upsampling for VFMs from the perspective of \textbf{\textit{inverse problem}} and propose Weighted Reverse Convolution (WRC), a spatially adaptive inverse operator for densifying high-level visual descriptors. Specifically, we formulate feature upsampling as a weighted Tikhonov-regularized least-squares problem, where spatially varying weights modulate both data fidelity and prior strength at each spatial location. This allows WRC to adapt the reconstruction to spatially varying feature characteristics, thereby preserving critical structures while mitigating over-smoothing. Moreover, WRC retains an efficient, fully differentiable closed-form FFT solution, making it a practical drop-in upsampling operator. Integrated into a lightweight self-supervised densification framework, WRC consistently improves dense feature quality across various downstream benchmarks, including segmentation, depth estimation, video object segmentation, object discovery, and keypoint correspondence, while maintaining high computational efficiency.
Problem

Research questions and friction points this paper is trying to address.

feature upsampling
vision foundation models
dense prediction
fine-grained localization
point-wise correspondence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Weighted Reverse Convolution
feature upsampling
inverse problem
spatially adaptive
dense feature learning