🤖 AI Summary
This study addresses the challenge of lightweight multi-class object detection using only image-level positive and negative labels, without requiring bounding box annotations. To this end, the authors propose an end-to-end framework based on a distributed convolutional neural network (DisCNN), which enables efficient object localization through multi-scale feature hierarchy modeling, aggregation of high-response regions, and parallelized inference. The method achieves competitive detection accuracy while significantly accelerating both single-object and multi-object detection speeds. Experimental results demonstrate the feasibility of performing efficient, lightweight, and parallel multi-class detection under the constraint of image-level supervision alone, thereby eliminating the need for costly bounding box labels during training.
📝 Abstract
Based on the Distributed Convolutional Neural Network(DisCNN), a straightforward object detection method is proposed. The modules of the output vector of a DisCNN with respect to a specific positive class are positively monotonic with the presence probabilities of the positive features. So, by identifying all high-scoring patches across all possible scales, the positive object can be detected by overlapping them to form a bounding box. The essential idea is that the object is detected by detecting its features on multiple scales, ranging from specific sub-features to abstract features composed of these sub-features. Training DisCNN requires only object-centered image data with positive and negative class labels. The detection process for multiple positive classes can be conducted in parallel to significantly accelerate it, and also faster for single-object detection because of its lightweight model architecture.