🤖 AI Summary
Existing few-shot/zero-shot object counting methods suffer from ambiguity in fine-grained categories (e.g., near-identical biological individuals) and rely heavily on labor-intensive manual annotations. To address this, we propose the first unsupervised fine-grained counting adaptation paradigm: (1) leveraging latent-variable generative models to synthesize high-density fine-grained scenes with corresponding pseudo-labels; (2) designing a pseudo-supervised attention-based boundary prediction network to precisely localize highly similar objects; and (3) integrating this into a few-shot/zero-shot counting framework with synthetic-data-driven fine-tuning. To support rigorous evaluation, we introduce FGTC—the first fine-grained natural-image counting benchmark. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on biological fine-grained counting tasks, achieving cross-category generalization without any human annotation—solely trained on synthetic data.
📝 Abstract
Few/zero-shot object counting methods reduce the need for extensive annotations but often struggle to distinguish between fine-grained categories, especially when multiple similar objects appear in the same scene. To address this limitation, we propose an annotation-free approach that enables the seamless integration of new fine-grained categories into existing few/zero-shot counting models. By leveraging latent generative models, we synthesize high-quality, category-specific crowded scenes, providing a rich training source for adapting to new categories without manual labeling. Our approach introduces an attention prediction network that identifies fine-grained category boundaries trained using only synthetic pseudo-annotated data. At inference, these fine-grained attention estimates refine the output of existing few/zero-shot counting networks. To benchmark our method, we further introduce the FGTC dataset, a taxonomy-specific fine-grained object counting dataset for natural images. Our method substantially enhances pre-trained state-of-the-art models on fine-grained taxon counting tasks, while using only synthetic data. Code and data to be released upon acceptance.