Video Individual Counting for Moving Drones

📅 2025-03-12

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

Existing video-based individual counting (VIC) methods suffer from significant performance degradation in dense, dynamic, low-resolution, and heavily occluded scenes captured by moving drones—due to their reliance on static viewpoints, sparse crowd datasets, and object-detection-and-tracking paradigms. This work introduces a novel VIC paradigm tailored for dynamic crowds: (1) we construct MovingDroneCrowd, the first large-scale benchmark dataset of dense crowds captured by mobile drones; (2) we propose a Deep-level Cross-frame Attention (DCFA) module that directly regresses density maps to model inter-frame crowd density dynamics, eliminating error-prone detection and association steps; and (3) we integrate deep-level convolutional attention, cross-frame density sharing, and inflow/outflow density estimation. Extensive experiments on MovingDroneCrowd and multiple public benchmarks demonstrate substantial improvements over state-of-the-art methods—particularly under high-motion, low-resolution, and severe occlusion conditions, where counting errors are reduced by over 30%.

Technology Category

Application Category

📝 Abstract

Video Individual Counting (VIC) has received increasing attentions recently due to its importance in intelligent video surveillance. Existing works are limited in two aspects, i.e., dataset and method. Previous crowd counting datasets are captured with fixed or rarely moving cameras with relatively sparse individuals, restricting evaluation for a highly varying view and time in crowded scenes. While VIC methods have been proposed based on localization-then-association or localization-then-classification, they may not perform well due to difficulty in accurate localization of crowded and small targets under challenging scenarios. To address these issues, we collect a MovingDroneCrowd Dataset and propose a density map based VIC method. Different from existing datasets, our dataset consists of videos captured by fast-moving drones in crowded scenes under diverse illuminations, shooting heights and angles. Other than localizing individuals, we propose a Depth-wise Cross-Frame Attention (DCFA) module, which directly estimate inflow and outflow density maps through learning shared density maps between consecutive frames. The inflow density maps across frames are summed up to obtain the number of unique pedestrians in a video. Experiments on our datasets and publicly available ones show the superiority of our method over the state of the arts for VIC in highly dynamic and complex crowded scenes. Our dataset and codes will be released publicly.

Problem

Research questions and friction points this paper is trying to address.

Addresses limitations in video individual counting datasets and methods.

Proposes a new dataset for crowded scenes with moving drones.

Introduces a density map method for accurate pedestrian counting.

Innovation

Methods, ideas, or system contributions that make the work stand out.

MovingDroneCrowd Dataset for dynamic scenes

Density map based Video Individual Counting

Depth-wise Cross-Frame Attention module

🔎 Similar Papers

CloudTrack: Scalable UAV Tracking with Cloud Semantics