S3OD: Towards Generalizable Salient Object Detection with Synthetic Data

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Significant object detection (SOD) faces challenges of high pixel-level annotation cost and poor cross-task generalization. To address these, we propose S3OD: (1) a large-scale synthetic dataset of 139K high-resolution images generated via multimodal diffusion models, coupled with an unsupervised, high-accuracy pseudo-labeling method that jointly leverages DINO-v3 self-supervised features and diffusion intermediate representations—marking the first such approach; (2) an ambiguity-aware multi-mask decoder that explicitly models multiple plausible interpretations of saliency; and (3) a performance-feedback-driven iterative data synthesis mechanism that dynamically prioritizes samples. Trained solely on synthetic data, S3OD reduces cross-dataset prediction error by 20–50%. After fine-tuning, it achieves state-of-the-art performance on DIS and HR-SOD benchmarks.

Technology Category

Application Category

📝 Abstract

Salient object detection exemplifies data-bounded tasks where expensive pixel-precise annotations force separate model training for related subtasks like DIS and HR-SOD. We present a method that dramatically improves generalization through large-scale synthetic data generation and ambiguity-aware architecture. We introduce S3OD, a dataset of over 139,000 high-resolution images created through our multi-modal diffusion pipeline that extracts labels from diffusion and DINO-v3 features. The iterative generation framework prioritizes challenging categories based on model performance. We propose a streamlined multi-mask decoder that naturally handles the inherent ambiguity in salient object detection by predicting multiple valid interpretations. Models trained solely on synthetic data achieve 20-50% error reduction in cross-dataset generalization, while fine-tuned versions reach state-of-the-art performance across DIS and HR-SOD benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Addressing data scarcity in salient object detection tasks

Improving cross-dataset generalization using synthetic data

Handling ambiguity through multi-mask prediction architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages large-scale synthetic data generation pipeline

Uses ambiguity-aware multi-mask decoder architecture

Implements iterative challenging category prioritization framework

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)