🤖 AI Summary
This work addresses the limited generalization of existing crowd counting models in unseen surveillance scenarios. To tackle this issue, the authors treat different surveillance scenes as distinct categories and propose a few-shot learning framework that leverages multiple local density learners to generate diverse local density prototypes. These prototypes are integrated with global density features to enable synergistic adaptation between local and global representations. The method further incorporates a density similarity matrix encoding scheme to enhance cross-scene transferability. Experimental results on three standard surveillance datasets demonstrate that the proposed approach significantly outperforms current state-of-the-art models, achieving notably improved counting accuracy in novel scenes.
📝 Abstract
Crowd scenes captured by cameras at different locations vary greatly, and existing crowd models have limited generalization for unseen surveillance scenes. To improve the generalization of the model, we regard different surveillance scenes as different category scenes, and introduce few-shot learning to make the model adapt to the unseen surveillance scene that belongs to the given exemplar category scene. To this end, we propose to leverage local and global density characteristics to guide the model of crowd counting for unseen surveillance scenes. Specifically, to enable the model to adapt to the varying density variations in the target scene, we propose the multiple local density learner to learn multi prototypes which represent different density distributions in the support scene. Subsequently, these multiple local density similarity matrixes are encoded. And they are utilized to guide the model in a local way. To further adapt to the global density in the target scene, the global density features are extracted from the support image, then it is used to guide the model in a global way. Experiments on three surveillance datasets shows that proposed method can adapt to the unseen surveillance scene and outperform recent state-of-the-art methods in the few-shot crowd counting.