MMDEW: Multipurpose Multiclass Density Estimation in the Wild

πŸ“… 2025-10-02
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address inaccurate multi-class object counting under dense occlusion, this paper proposes a density-map estimation-based multi-task framework. Methodologically, it introduces a class-focusing module to suppress inter-class interference and pioneers the incorporation of region-aware loss into multi-class density estimation. Built upon the Twins Pyramid Vision Transformer (ViT) backbone, the framework integrates multi-scale decoding with a dedicated multi-class counting head and incorporates segmentation-guided auxiliary learning. Contributions include: (1) significantly improved counting accuracy in high-density, heavily occluded scenarios; and (2) successful extension of density estimation to novel application domains such as biodiversity monitoring. Experimental results demonstrate mean absolute error (MAE) reductions of 33%, 43%, and 64% on the VisDrone and iSAID datasets, respectively. Cross-domain experiments further validate the model’s strong generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Density map estimation can be used to estimate object counts in dense and occluded scenes where discrete counting-by-detection methods fail. We propose a multicategory counting framework that leverages a Twins pyramid vision-transformer backbone and a specialised multi-class counting head built on a state-of-the-art multiscale decoding approach. A two-task design adds a segmentation-based Category Focus Module, suppressing inter-category cross-talk at training time. Training and evaluation on the VisDrone and iSAID benchmarks demonstrates superior performance versus prior multicategory crowd-counting approaches (33%, 43% and 64% reduction to MAE), and the comparison with YOLOv11 underscores the necessity of crowd counting methods in dense scenes. The method's regional loss opens up multi-class crowd counting to new domains, demonstrated through the application to a biodiversity monitoring dataset, highlighting its capacity to inform conservation efforts and enable scalable ecological insights.
Problem

Research questions and friction points this paper is trying to address.

Estimating object counts in dense occluded scenes using density maps
Developing multicategory counting with transformer backbone and segmentation module
Extending crowd counting to biodiversity monitoring for ecological insights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Twins pyramid vision-transformer backbone
Adds segmentation-based Category Focus Module
Uses specialized multi-class counting head
πŸ”Ž Similar Papers
No similar papers found.