The MixCount Dataset: Bridging the Data Gap for Open-Vocabulary Object Counting

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing object counting methods struggle in mixed-object scenes, primarily due to the high cost and noise associated with real-world annotations, as well as the limited diversity and realism of synthetic data. To address this, this work proposes the first high-quality synthetic data generation framework tailored for open-vocabulary mixed-object counting. By integrating automated image synthesis, fine-grained textual descriptions, and pixel-level precise annotations, the framework constructs MixCount—a large-scale, unambiguous dataset and benchmark. This approach substantially alleviates the data bottleneck, reducing mean absolute error (MAE) by 20.14% on the FSC-147 benchmark and by 18.3% on PairTally, thereby significantly enhancing model generalization in real-world scenarios.

📝 Abstract

Object counting is a foundational vision task with over a decade of dedicated research, yet state-of-the-art models still fail systematically in the mixed-object setting that dominates real-world applications such as industrial inspection and product sorting. We show that this gap is strongly driven by limitations in existing training and evaluation data: real counting datasets are prohibitively expensive to annotate and suffer from labeling noise, while existing synthetic alternatives lack diversity and realism. We address this with MixCount, a dataset and benchmark for mixed-object counting designed to target the failure modes of current counting models. To overcome the high cost of constructing and labeling such data, we develop an automatic generation pipeline that synthesizes images, fine-grained textual descriptions, and pixel-perfect counting annotations at scale, eliminating the labeling ambiguity that plagues prior datasets. Evaluating state-of-the-art counting models on MixCount exposes severe degradation in the mixed-object setting. More importantly, training these models on our synthesized data yields substantial gains on real-world benchmarks, reducing MAE by 20.14% on FSC-147 and by 18.3% on PairTally. These results establish MixCount as both a benchmark and a training dataset for fine-grained counting, and demonstrate that our pipeline, which produces effectively unlimited labeled data, helps address a long-standing bottleneck in counting models.

Problem

Research questions and friction points this paper is trying to address.

object counting

mixed-object setting

training data limitation

labeling noise

synthetic dataset

Innovation

Methods, ideas, or system contributions that make the work stand out.

open-vocabulary object counting

synthetic data generation

pixel-perfect annotation