A Simple-but-effective Baseline for Training-free Class-Agnostic Counting

📅 2024-03-03

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This work addresses class-agnostic counting (CAC), proposing a training-free method that achieves accurate counting of arbitrary object classes using only a few reference samples. To overcome the accuracy limitations of existing training-free approaches, we systematically integrate four novel technical components: (1) a superpixel-guided point prompting mechanism to enhance localization robustness; (2) replacement of SAM’s image encoder with a semantically richer backbone; (3) multi-scale feature fusion to improve scale invariance; and (4) a propagation-based prototype updating strategy for dynamic representation refinement. Evaluated on standard CAC benchmarks, our method significantly outperforms all training-free baselines and, for the first time, matches the state-of-the-art performance of fully supervised methods. This demonstrates the feasibility and competitiveness of the “zero-training” paradigm for fine-grained counting tasks.

Technology Category

Application Category

📝 Abstract

Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.

Problem

Research questions and friction points this paper is trying to address.

Class-Agnostic Counting

Accuracy

Non-Learning Approach

Innovation

Methods, ideas, or system contributions that make the work stand out.

Class-Agnostic Counting

Multi-Angle Analysis

Enhanced Object Initialization

🔎 Similar Papers

No similar papers found.