Revisiting Salient Object Detection from an Observer-Centric Perspective

📅 2026-02-06

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

This work addresses the inherent ambiguity in traditional salient object detection (SOD), which neglects the subjective preferences of human observers. To overcome this limitation, we propose Observer-Centric Salient Object Detection (OC-SOD), a novel paradigm that models visual saliency from an individual observer’s perspective by integrating generic visual cues with observer-specific factors such as intent or preference to enable personalized prediction. To support this direction, we introduce OC-SODBench, the first benchmark dataset for OC-SOD, comprising 33K images and 152K multimodal prompts capturing diverse observer contexts. Furthermore, we design OC-SODAgent, a multimodal large language model–based agent equipped with a “perceive–reflect–adapt” mechanism to dynamically tailor predictions to individual observers. Extensive experiments demonstrate the effectiveness of our approach in aligning with human perceptual judgments.

Technology Category

Application Category

📝 Abstract

Salient object detection is inherently a subjective problem, as observers with different priors may perceive different objects as salient. However, existing methods predominantly formulate it as an objective prediction task with a single groundtruth segmentation map for each image, which renders the problem under-determined and fundamentally ill-posed. To address this issue, we propose Observer-Centric Salient Object Detection (OC-SOD), where salient regions are predicted by considering not only the visual cues but also the observer-specific factors such as their preferences or intents. As a result, this formulation captures the intrinsic ambiguity and diversity of human perception, enabling personalized and context-aware saliency prediction. By leveraging multi-modal large language models, we develop an efficient data annotation pipeline and construct the first OC-SOD dataset named OC-SODBench, comprising 33k training, validation and test images with 152k textual prompts and object pairs. Built upon this new dataset, we further design OC-SODAgent, an agentic baseline which performs OC-SOD via a human-like"Perceive-Reflect-Adjust"process. Extensive experiments on our proposed OC-SODBench have justified the effectiveness of our contribution. Through this observer-centric perspective, we aim to bridge the gap between human perception and computational modeling, offering a more realistic and flexible understanding of what makes an object truly"salient."Code and dataset are publicly available at: https://github.com/Dustzx/OC_SOD

Problem

Research questions and friction points this paper is trying to address.

Salient Object Detection

Subjectivity

Observer-Centric

Human Perception

Ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Observer-Centric Salient Object Detection

Personalized Saliency Prediction

Multi-modal Large Language Models