🤖 AI Summary
To address the challenges of precise localization and poor generalization for dense, small objects in low-shot object counting, this paper pioneers the integration of latent diffusion models into this task, proposing an exemplar-conditioned density map generation framework. We design a prototype-guided conditional modulation module that dynamically fuses support sample features into intermediate layers of the denoising network, balancing localization accuracy and cross-category generalization. Counting is achieved end-to-end via density map regression followed by non-maximum suppression (NMS) post-processing. On the Few-Shot Counting (FSC) benchmark, our method reduces mean absolute error (MAE) by 15%, 13%, and 10% under few-shot, one-shot, and reference-less settings, respectively. On the Multi-Category Adaptive Counting (MCAC) benchmark, it achieves a 44% MAE improvement, establishing new state-of-the-art performance.
📝 Abstract
Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern low-shot counters are dense regions with small objects. While total counts in such situations are typically well addressed by density-based counters, their usefulness is limited by poor localization capabilities. This is better addressed by point-detection-based counters, which are based on query-based detectors. However, due to limited number of pre-trained queries, they underperform on images with very large numbers of objects, and resort to ad-hoc techniques like upsampling and tiling. We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression. Our core contribution is the new exemplar-based conditioning module that extracts and adjusts the object prototypes to the intermediate layers of the denoising network, leading to accurate object location estimation. On FSC benchmark, CoDi outperforms state-of-the-art by 15% MAE, 13% MAE and 10% MAE in the few-shot, one-shot, and reference-less scenarios, respectively, and sets a new state-of-the-art on MCAC benchmark by outperforming the top method by 44% MAE. The code is available at https://github.com/gsustar/CoDi.