๐ค AI Summary
This work addresses class-agnostic counting (CAC), proposing a training-free method that achieves accurate counting of arbitrary object classes using only a few reference samples. To overcome the accuracy limitations of existing training-free approaches, we systematically integrate four novel technical components: (1) a superpixel-guided point prompting mechanism to enhance localization robustness; (2) replacement of SAMโs image encoder with a semantically richer backbone; (3) multi-scale feature fusion to improve scale invariance; and (4) a propagation-based prototype updating strategy for dynamic representation refinement. Evaluated on standard CAC benchmarks, our method significantly outperforms all training-free baselines and, for the first time, matches the state-of-the-art performance of fully supervised methods. This demonstrates the feasibility and competitiveness of the โzero-trainingโ paradigm for fine-grained counting tasks.
๐ Abstract
Class-Agnostic Counting (CAC) seeks to accurately count objects in a given image with only a few reference examples. While previous methods achieving this relied on additional training, recent efforts have shown that it's possible to accomplish this without training by utilizing pre-existing foundation models, particularly the Segment Anything Model (SAM), for counting via instance-level segmentation. Although promising, current training-free methods still lag behind their training-based counterparts in terms of performance. In this research, we present a straightforward training-free solution that effectively bridges this performance gap, serving as a strong baseline. The primary contribution of our work lies in the discovery of four key technologies that can enhance performance. Specifically, we suggest employing a superpixel algorithm to generate more precise initial point prompts, utilizing an image encoder with richer semantic knowledge to replace the SAM encoder for representing candidate objects, and adopting a multiscale mechanism and a transductive prototype scheme to update the representation of reference examples. By combining these four technologies, our approach achieves significant improvements over existing training-free methods and delivers performance on par with training-based ones.