S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Significant object detection (SOD) faces challenges of high pixel-level annotation cost and poor cross-task generalization. To address these, we propose S3OD: (1) a large-scale synthetic dataset of 139K high-resolution images generated via multimodal diffusion models, coupled with an unsupervised, high-accuracy pseudo-labeling method that jointly leverages DINO-v3 self-supervised features and diffusion intermediate representations—marking the first such approach; (2) an ambiguity-aware multi-mask decoder that explicitly models multiple plausible interpretations of saliency; and (3) a performance-feedback-driven iterative data synthesis mechanism that dynamically prioritizes samples. Trained solely on synthetic data, S3OD reduces cross-dataset prediction error by 20–50%. After fine-tuning, it achieves state-of-the-art performance on DIS and HR-SOD benchmarks.

Technology Category

Application Category

📝 Abstract
Salient object detection exemplifies data-bounded tasks where expensive pixel-precise annotations force separate model training for related subtasks like DIS and HR-SOD. We present a method that dramatically improves generalization through large-scale synthetic data generation and ambiguity-aware architecture. We introduce S3OD, a dataset of over 139,000 high-resolution images created through our multi-modal diffusion pipeline that extracts labels from diffusion and DINO-v3 features. The iterative generation framework prioritizes challenging categories based on model performance. We propose a streamlined multi-mask decoder that naturally handles the inherent ambiguity in salient object detection by predicting multiple valid interpretations. Models trained solely on synthetic data achieve 20-50% error reduction in cross-dataset generalization, while fine-tuned versions reach state-of-the-art performance across DIS and HR-SOD benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in salient object detection tasks
Improving cross-dataset generalization using synthetic data
Handling ambiguity through multi-mask prediction architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large-scale synthetic data generation pipeline
Uses ambiguity-aware multi-mask decoder architecture
Implements iterative challenging category prioritization framework
🔎 Similar Papers
No similar papers found.