MMDEW: Multipurpose Multiclass Density Estimation in the Wild

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

To address inaccurate multi-class object counting under dense occlusion, this paper proposes a density-map estimation-based multi-task framework. Methodologically, it introduces a class-focusing module to suppress inter-class interference and pioneers the incorporation of region-aware loss into multi-class density estimation. Built upon the Twins Pyramid Vision Transformer (ViT) backbone, the framework integrates multi-scale decoding with a dedicated multi-class counting head and incorporates segmentation-guided auxiliary learning. Contributions include: (1) significantly improved counting accuracy in high-density, heavily occluded scenarios; and (2) successful extension of density estimation to novel application domains such as biodiversity monitoring. Experimental results demonstrate mean absolute error (MAE) reductions of 33%, 43%, and 64% on the VisDrone and iSAID datasets, respectively. Cross-domain experiments further validate the model’s strong generalization capability.

Technology Category

Application Category

📝 Abstract

Density map estimation can be used to estimate object counts in dense and occluded scenes where discrete counting-by-detection methods fail. We propose a multicategory counting framework that leverages a Twins pyramid vision-transformer backbone and a specialised multi-class counting head built on a state-of-the-art multiscale decoding approach. A two-task design adds a segmentation-based Category Focus Module, suppressing inter-category cross-talk at training time. Training and evaluation on the VisDrone and iSAID benchmarks demonstrates superior performance versus prior multicategory crowd-counting approaches (33%, 43% and 64% reduction to MAE), and the comparison with YOLOv11 underscores the necessity of crowd counting methods in dense scenes. The method's regional loss opens up multi-class crowd counting to new domains, demonstrated through the application to a biodiversity monitoring dataset, highlighting its capacity to inform conservation efforts and enable scalable ecological insights.

Problem

Research questions and friction points this paper is trying to address.

Estimating object counts in dense occluded scenes using density maps

Developing multicategory counting with transformer backbone and segmentation module

Extending crowd counting to biodiversity monitoring for ecological insights

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Twins pyramid vision-transformer backbone

Adds segmentation-based Category Focus Module

Uses specialized multi-class counting head

🔎 Similar Papers

No similar papers found.