CoDi -- an exemplar-conditioned diffusion model for low-shot counting

📅 2025-12-23

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address the challenges of precise localization and poor generalization for dense, small objects in low-shot object counting, this paper pioneers the integration of latent diffusion models into this task, proposing an exemplar-conditioned density map generation framework. We design a prototype-guided conditional modulation module that dynamically fuses support sample features into intermediate layers of the denoising network, balancing localization accuracy and cross-category generalization. Counting is achieved end-to-end via density map regression followed by non-maximum suppression (NMS) post-processing. On the Few-Shot Counting (FSC) benchmark, our method reduces mean absolute error (MAE) by 15%, 13%, and 10% under few-shot, one-shot, and reference-less settings, respectively. On the Multi-Category Adaptive Counting (MCAC) benchmark, it achieves a 44% MAE improvement, establishing new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract

Low-shot object counting addresses estimating the number of previously unobserved objects in an image using only few or no annotated test-time exemplars. A considerable challenge for modern low-shot counters are dense regions with small objects. While total counts in such situations are typically well addressed by density-based counters, their usefulness is limited by poor localization capabilities. This is better addressed by point-detection-based counters, which are based on query-based detectors. However, due to limited number of pre-trained queries, they underperform on images with very large numbers of objects, and resort to ad-hoc techniques like upsampling and tiling. We propose CoDi, the first latent diffusion-based low-shot counter that produces high-quality density maps on which object locations can be determined by non-maxima suppression. Our core contribution is the new exemplar-based conditioning module that extracts and adjusts the object prototypes to the intermediate layers of the denoising network, leading to accurate object location estimation. On FSC benchmark, CoDi outperforms state-of-the-art by 15% MAE, 13% MAE and 10% MAE in the few-shot, one-shot, and reference-less scenarios, respectively, and sets a new state-of-the-art on MCAC benchmark by outperforming the top method by 44% MAE. The code is available at https://github.com/gsustar/CoDi.

Problem

Research questions and friction points this paper is trying to address.

Estimates object counts from few or no exemplars

Addresses dense regions with small object challenges

Improves localization accuracy over density-based counters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion model for low-shot counting

Exemplar-based conditioning module for object prototypes

Generates high-quality density maps for object localization

🔎 Similar Papers

No similar papers found.