🤖 AI Summary
Multi-view crowd counting suffers from scarcity of multi-view annotated data and limitations in scene diversity and frame count. Method: This paper proposes a semi-supervised learning framework leveraging model prediction consistency and uncertainty-aware ranking. It introduces a novel multi-view fusion model and, for the first time, encodes monotonicity of count predictions across views as a semi-supervised regularization prior. The framework integrates Monte Carlo Dropout for uncertainty estimation and enforces dual ranking constraints—prediction ranking and uncertainty ranking—to enhance robustness and generalization. Contribution/Results: Evaluated under limited labeling budgets, the method reduces mean absolute error by 12.7% over state-of-the-art semi-supervised approaches. Gains are especially pronounced in heavily occluded scenes, significantly alleviating reliance on densely annotated multi-view data.
📝 Abstract
Multi-view crowd counting has been proposed to deal with the severe occlusion issue of crowd counting in large and wide scenes. However, due to the difficulty of collecting and annotating multi-view images, the datasets for multi-view counting have a limited number of multi-view frames and scenes. To solve the problem of limited data, one approach is to collect synthetic data to bypass the annotating step, while another is to propose semi- or weakly-supervised or unsupervised methods that demand less multi-view data. In this paper, we propose two semi-supervised multi-view crowd counting frameworks by ranking the multi-view fusion models of different numbers of input views, in terms of the model predictions or the model uncertainties. Specifically, for the first method (vanilla model), we rank the multi-view fusion models' prediction results of different numbers of camera-view inputs, namely, the model's predictions with fewer camera views shall not be larger than the predictions with more camera views. For the second method, we rank the estimated model uncertainties of the multi-view fusion models with a variable number of view inputs, guided by the multi-view fusion models' prediction errors, namely, the model uncertainties with more camera views shall not be larger than those with fewer camera views. These constraints are introduced into the model training in a semi-supervised fashion for multi-view counting with limited labeled data. The experiments demonstrate the advantages of the proposed multi-view model ranking methods compared with other semi-supervised counting methods.