Generalized Small Object Detection:A Point-Prompted Paradigm and Benchmark

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of small object detection, which suffers from sparse pixel representation, ambiguous boundaries, annotation difficulty, and weak semantic features. To overcome these limitations, the authors propose a novel paradigm termed Point-Prompted Small Object Detection (P2SOD), which enhances semantic localization during inference by incorporating sparse point prompts. They also introduce TinySet-9M, the first large-scale, multi-domain dataset dedicated to small object detection. The proposed DEAL framework enables label-efficient learning, achieving a 31.4% improvement over fully supervised baselines in AP75 on TinySet-9M with only a single click per object. Furthermore, the method demonstrates strong generalization capabilities across categories and datasets, highlighting its robustness and practicality in diverse detection scenarios.
📝 Abstract
Small object detection (SOD) remains challenging due to extremely limited pixels and ambiguous object boundaries. These characteristics lead to challenging annotation, limited availability of large-scale high-quality datasets, and inherently weak semantic representations for small objects. In this work, we first address the data limitation by introducing TinySet-9M, the first large-scale, multi-domain dataset for small object detection. Beyond filling the gap in large-scale datasets, we establish a benchmark to evaluate the effectiveness of existing label-efficient detection methods for small objects. Our evaluation reveals that weak visual cues further exacerbate the performance degradation of label-efficient methods in small object detection, highlighting a critical challenge in label-efficient SOD. Secondly, to tackle the limitation of insufficient semantic representation, we move beyond training-time feature enhancement and propose a new paradigm termed Point-Prompt Small Object Detection (P2SOD). This paradigm introduces sparse point prompts at inference time as an efficient information bridge for category-level localization, enabling semantic augmentation. Building upon the P2SOD paradigm and the large-scale TinySet-9M dataset, we further develop DEAL (DEtect Any smalL object), a scalable and transferable point-prompted detection framework that learns robust, prompt-conditioned representations from large-scale data. With only a single click at inference time, DEAL achieves a 31.4% relative improvement over fully supervised baselines under strict localization metrics (e.g., AP75) on TinySet-9M, while generalizing effectively to unseen categories and unseen datasets. Our project is available at https://zhuhaoraneis.github.io/TinySet-9M/.
Problem

Research questions and friction points this paper is trying to address.

Small Object Detection
Label-Efficient Learning
Semantic Representation
Data Scarcity
Weak Visual Cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Point-Prompt
Small Object Detection
TinySet-9M
Label-Efficient Learning
DEAL
🔎 Similar Papers
No similar papers found.