Local Scale Equivariance with Latent Deep Equilibrium Canonicalizer

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

In computer vision, objects within the same image exhibit local scale variations due to depth disparities, leading to inconsistent model predictions. To address this, we propose the Depth-Equivariant Normalization (DEC) module, which jointly learns local scale-equivariant representations and performs implicit-space adaptive normalization. DEC is plug-and-play, compatible with mainstream vision architectures—including ViT, DeiT, Swin, and BEiT—and seamlessly integrates with pre-trained models without architectural modification. On ImageNet, DEC consistently improves top-1 classification accuracy across four model families (average gain of +0.8%) and significantly enhances cross-scale prediction consistency, as measured by mean Average Precision (+2.3%). These results demonstrate DEC’s effectiveness and generalizability for robust scale-aware recognition. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

Scale variation is a fundamental challenge in computer vision. Objects of the same class can have different sizes, and their perceived size is further affected by the distance from the camera. These variations are local to the objects, i.e., different object sizes may change differently within the same image. To effectively handle scale variations, we present a deep equilibrium canonicalizer (DEC) to improve the local scale equivariance of a model. DEC can be easily incorporated into existing network architectures and can be adapted to a pre-trained model. Notably, we show that on the competitive ImageNet benchmark, DEC improves both model performance and local scale consistency across four popular pre-trained deep-nets, e.g., ViT, DeiT, Swin, and BEiT. Our code is available at https://github.com/ashiq24/local-scale-equivariance.

Problem

Research questions and friction points this paper is trying to address.

Addresses local scale variation in computer vision

Improves model equivariance to object size changes

Enhances performance and scale consistency across networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep equilibrium canonicalizer improves local equivariance

Adaptable to existing pre-trained deep network architectures

Enhances performance and scale consistency on ImageNet

🔎 Similar Papers

Improving Equivariant Model Training via Constraint Relaxation