CountingDINO: A Training-free Pipeline for Class-Agnostic Counting using Unsupervised Backbones

📅 2025-04-23

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

Existing category-agnostic counting (CAC) methods, while flexible in inference, remain heavily reliant on annotated data—limiting their generalizability and scalability. To address this, we propose the first fully training-free CAC framework: it leverages a self-supervised DINO model to extract object-aware features, employs ROI-Align to harvest class-agnostic prototypes from example images as convolutional kernels, and generates similarity maps that are normalized into density maps during inference—requiring neither annotations nor model fine-tuning. This work introduces the first example-guided, purely unsupervised visual backbone-driven CAC method. On the FSC-147 benchmark, our approach outperforms all training-free baselines and matches or surpasses several supervised state-of-the-art methods, demonstrating the substantial advantages of the training-free paradigm in accuracy, universality, and deployment efficiency.

Technology Category

Application Category

📝 Abstract

Class-agnostic counting (CAC) aims to estimate the number of objects in images without being restricted to predefined categories. However, while current exemplar-based CAC methods offer flexibility at inference time, they still rely heavily on labeled data for training, which limits scalability and generalization to many downstream use cases. In this paper, we introduce CountingDINO, the first training-free exemplar-based CAC framework that exploits a fully unsupervised feature extractor. Specifically, our approach employs self-supervised vision-only backbones to extract object-aware features, and it eliminates the need for annotated data throughout the entire proposed pipeline. At inference time, we extract latent object prototypes via ROI-Align from DINO features and use them as convolutional kernels to generate similarity maps. These are then transformed into density maps through a simple yet effective normalization scheme. We evaluate our approach on the FSC-147 benchmark, where we outperform a baseline under the same label-free setting. Our method also achieves competitive -- and in some cases superior -- results compared to training-free approaches relying on supervised backbones, as well as several fully supervised state-of-the-art methods. This demonstrates that training-free CAC can be both scalable and competitive. Website: https://lorebianchi98.github.io/CountingDINO/

Problem

Research questions and friction points this paper is trying to address.

Class-agnostic counting without predefined categories

Eliminates reliance on labeled training data

Uses unsupervised backbones for scalable counting

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses unsupervised feature extractor for counting

Employs self-supervised vision-only backbones

Generates similarity maps via ROI-Align

🔎 Similar Papers

GCA-SUNet: A Gated Context-Aware Swin-UNet for Exemplar-Free Counting