A Simple-but-effective Baseline for Training-free Class-Agnostic Counting

๐Ÿ“… 2024-03-03
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses class-agnostic counting (CAC), proposing a training-free method that achieves accurate counting of arbitrary object classes using only a few reference samples. To overcome the accuracy limitations of existing training-free approaches, we systematically integrate four novel technical components: (1) a superpixel-guided point prompting mechanism to enhance localization robustness; (2) replacement of SAMโ€™s image encoder with a semantically richer backbone; (3) multi-scale feature fusion to improve scale invariance; and (4) a propagation-based prototype updating strategy for dynamic representation refinement. Evaluated on standard CAC benchmarks, our method significantly outperforms all training-free baselines and, for the first time, matches the state-of-the-art performance of fully supervised methods. This demonstrates the feasibility and competitiveness of the โ€œzero-trainingโ€ paradigm for fine-grained counting tasks.

Technology Category

Application Category

๐Ÿ“ Abstract
Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.
Problem

Research questions and friction points this paper is trying to address.

Class-Agnostic Counting
Accuracy
Non-Learning Approach
Innovation

Methods, ideas, or system contributions that make the work stand out.

Class-Agnostic Counting
Multi-Angle Analysis
Enhanced Object Initialization
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yuhao Lin
Australian Institution for Machine Learning, The University of Adelaide
Hai-Ming Xu
Hai-Ming Xu
TikTok
Machine LearningComputer Vision
Lingqiao Liu
Lingqiao Liu
Associate Professor at the University of Adelaide
computer visionmachine learning
J
Javen Qinfeng Shi
Australian Institution for Machine Learning, The University of Adelaide