Revisiting Salient Object Detection from an Observer-Centric Perspective

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inherent ambiguity in traditional salient object detection (SOD), which neglects the subjective preferences of human observers. To overcome this limitation, we propose Observer-Centric Salient Object Detection (OC-SOD), a novel paradigm that models visual saliency from an individual observer’s perspective by integrating generic visual cues with observer-specific factors such as intent or preference to enable personalized prediction. To support this direction, we introduce OC-SODBench, the first benchmark dataset for OC-SOD, comprising 33K images and 152K multimodal prompts capturing diverse observer contexts. Furthermore, we design OC-SODAgent, a multimodal large language model–based agent equipped with a “perceive–reflect–adapt” mechanism to dynamically tailor predictions to individual observers. Extensive experiments demonstrate the effectiveness of our approach in aligning with human perceptual judgments.

Technology Category

Application Category

📝 Abstract
Salient object detection is inherently a subjective problem, as observers with different priors may perceive different objects as salient. However, existing methods predominantly formulate it as an objective prediction task with a single groundtruth segmentation map for each image, which renders the problem under-determined and fundamentally ill-posed. To address this issue, we propose Observer-Centric Salient Object Detection (OC-SOD), where salient regions are predicted by considering not only the visual cues but also the observer-specific factors such as their preferences or intents. As a result, this formulation captures the intrinsic ambiguity and diversity of human perception, enabling personalized and context-aware saliency prediction. By leveraging multi-modal large language models, we develop an efficient data annotation pipeline and construct the first OC-SOD dataset named OC-SODBench, comprising 33k training, validation and test images with 152k textual prompts and object pairs. Built upon this new dataset, we further design OC-SODAgent, an agentic baseline which performs OC-SOD via a human-like"Perceive-Reflect-Adjust"process. Extensive experiments on our proposed OC-SODBench have justified the effectiveness of our contribution. Through this observer-centric perspective, we aim to bridge the gap between human perception and computational modeling, offering a more realistic and flexible understanding of what makes an object truly"salient."Code and dataset are publicly available at: https://github.com/Dustzx/OC_SOD
Problem

Research questions and friction points this paper is trying to address.

Salient Object Detection
Subjectivity
Observer-Centric
Human Perception
Ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Observer-Centric Salient Object Detection
Personalized Saliency Prediction
Multi-modal Large Language Models
OC-SODBench
Perceive-Reflect-Adjust Agent
🔎 Similar Papers
No similar papers found.
F
Fuxi Zhang
Dalian University of Technology
Yifan Wang
Yifan Wang
Dalian University of Technology
Video & Image SegmentationImage Processing
H
Hengrun Zhao
Dalian University of Technology
Z
Zhuohan Sun
Dalian University of Technology
C
Changxing Xia
Dalian University of Technology
Lijun Wang
Lijun Wang
Zhejiang University
Statistical LearningBioinformaticsAstrophysics
H
Huchuan Lu
Dalian University of Technology
Y
Yangrui Shao
Dalian University of Technology
Chen Yang
Chen Yang
The Hong Kong University of Science and Technology
Transfer learningMedical Image Analysis
L
Long Teng
Dalian University of Technology