CrowdQuery: Density-Guided Query Module for Enhanced 2D and 3D Detection in Crowded Scenes

📅 2025-09-10

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address performance degradation in Transformer-based 2D/3D detectors under crowded scenes—caused by neglecting object density distribution—this paper proposes CrowdQuery (CQ), a density-aware learnable query mechanism. CQ explicitly incorporates a density map, fused with bounding box dimensions, into decoder queries, enabling cross-dimensional density-guided feature modeling. Its core contributions are: (1) a unified framework for 2D and 3D crowd detection; (2) an extended density map definition that integrates single-object geometric priors; and (3) an end-to-end learnable, density-guided query design compatible with mainstream Transformer detectors. Extensive experiments on STCrowd and CrowdHuman demonstrate that CQ significantly outperforms baseline methods and most state-of-the-art approaches, validating its effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

This paper introduces a novel method for end-to-end crowd detection that leverages object density information to enhance existing transformer-based detectors. We present CrowdQuery (CQ), whose core component is our CQ module that predicts and subsequently embeds an object density map. The embedded density information is then systematically integrated into the decoder. Existing density map definitions typically depend on head positions or object-based spatial statistics. Our method extends these definitions to include individual bounding box dimensions. By incorporating density information into object queries, our method utilizes density-guided queries to improve detection in crowded scenes. CQ is universally applicable to both 2D and 3D detection without requiring additional data. Consequently, we are the first to design a method that effectively bridges 2D and 3D detection in crowded environments. We demonstrate the integration of CQ into both a general 2D and 3D transformer-based object detector, introducing the architectures CQ2D and CQ3D. CQ is not limited to the specific transformer models we selected. Experiments on the STCrowd dataset for both 2D and 3D domains show significant performance improvements compared to the base models, outperforming most state-of-the-art methods. When integrated into a state-of-the-art crowd detector, CQ can further improve performance on the challenging CrowdHuman dataset, demonstrating its generalizability. The code is released at https://github.com/mdaehl/CrowdQuery.

Problem

Research questions and friction points this paper is trying to address.

Enhancing object detection in crowded scenes using density-guided queries

Integrating object density information into transformer-based detectors

Bridging 2D and 3D detection methods for crowded environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Density map embedded into decoder queries

Density-guided queries improve crowded scene detection

Universal method for both 2D and 3D detection

🔎 Similar Papers

No similar papers found.

Authors to Follow