Learning What NOT to Count

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing few-shot/zero-shot object counting methods suffer from ambiguity in fine-grained categories (e.g., near-identical biological individuals) and rely heavily on labor-intensive manual annotations. To address this, we propose the first unsupervised fine-grained counting adaptation paradigm: (1) leveraging latent-variable generative models to synthesize high-density fine-grained scenes with corresponding pseudo-labels; (2) designing a pseudo-supervised attention-based boundary prediction network to precisely localize highly similar objects; and (3) integrating this into a few-shot/zero-shot counting framework with synthetic-data-driven fine-tuning. To support rigorous evaluation, we introduce FGTC—the first fine-grained natural-image counting benchmark. Experiments demonstrate that our method significantly outperforms state-of-the-art approaches on biological fine-grained counting tasks, achieving cross-category generalization without any human annotation—solely trained on synthetic data.

Technology Category

Application Category

📝 Abstract
Few/zero-shot object counting methods reduce the need for extensive annotations but often struggle to distinguish between fine-grained categories, especially when multiple similar objects appear in the same scene. To address this limitation, we propose an annotation-free approach that enables the seamless integration of new fine-grained categories into existing few/zero-shot counting models. By leveraging latent generative models, we synthesize high-quality, category-specific crowded scenes, providing a rich training source for adapting to new categories without manual labeling. Our approach introduces an attention prediction network that identifies fine-grained category boundaries trained using only synthetic pseudo-annotated data. At inference, these fine-grained attention estimates refine the output of existing few/zero-shot counting networks. To benchmark our method, we further introduce the FGTC dataset, a taxonomy-specific fine-grained object counting dataset for natural images. Our method substantially enhances pre-trained state-of-the-art models on fine-grained taxon counting tasks, while using only synthetic data. Code and data to be released upon acceptance.
Problem

Research questions and friction points this paper is trying to address.

Distinguishing fine-grained categories in few/zero-shot object counting
Eliminating manual labeling for new category integration
Improving counting accuracy with synthetic data and attention networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Annotation-free fine-grained category integration
Latent generative models synthesize crowded scenes
Attention network refines few/zero-shot counting
🔎 Similar Papers
No similar papers found.