DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing salient object detection (SOD) methods rely on complex multi-stage architectures, suffering from feature redundancy and cross-module interference, which hinder performance. In contrast, the human visual system achieves efficient saliency perception through elegant, biologically grounded mechanisms. To bridge this gap, we propose DualGazeNet—the first pure Transformer-based SOD model inspired by the dual visual pathways (magnocellular and parvocellular) in biological vision. DualGazeNet eliminates dedicated fusion modules and multi-stage designs, instead achieving multi-scale feature integration and precise boundary localization via dual attention queries and cortical attention modulation. Evaluated on five mainstream RGB benchmarks, it outperforms 25 state-of-the-art methods, achieves a 60% inference speedup, reduces FLOPs by 53.4%, and demonstrates strong cross-domain generalization—e.g., in camouflaged and underwater scenes. DualGazeNet thus delivers superior accuracy, efficiency, and interpretability in a unified, biologically plausible framework.

Technology Category

Application Category

📝 Abstract
Recent salient object detection (SOD) methods aim to improve performance in four key directions: semantic enhancement, boundary refinement, auxiliary task supervision, and multi-modal fusion. In pursuit of continuous gains, these approaches have evolved toward increasingly sophisticated architectures with multi-stage pipelines, specialized fusion modules, edge-guided learning, and elaborate attention mechanisms. However, this complexity paradoxically introduces feature redundancy and cross-component interference that obscure salient cues, ultimately reaching performance bottlenecks. In contrast, human vision achieves efficient salient object identification without such architectural complexity. This contrast raises a fundamental question: can we design a biologically grounded yet architecturally simple SOD framework that dispenses with most of this engineering complexity, while achieving state-of-the-art accuracy, computational efficiency, and interpretability? In this work, we answer this question affirmatively by introducing DualGazeNet, a biologically inspired pure Transformer framework that models the dual biological principles of robust representation learning and magnocellular-parvocellular dual-pathway processing with cortical attention modulation in the human visual system. Extensive experiments on five RGB SOD benchmarks show that DualGazeNet consistently surpasses 25 state-of-the-art CNN- and Transformer-based methods. On average, DualGazeNet achieves about 60% higher inference speed and 53.4% fewer FLOPs than four Transformer-based baselines of similar capacity (VST++, MDSAM, Sam2unet, and BiRefNet). Moreover, DualGazeNet exhibits strong cross-domain generalization, achieving leading or highly competitive performance on camouflaged and underwater SOD benchmarks without relying on additional modalities.
Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottlenecks in salient object detection from architectural complexity
Models human visual dual-pathway processing for efficient object identification
Achieves state-of-the-art accuracy with reduced computational complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Biologically inspired pure Transformer framework
Models dual biological principles with cortical attention
Achieves high speed with reduced computational complexity
🔎 Similar Papers
No similar papers found.
Y
Yu Zhang
School of Computation, Information and Technology, Technical University of Munich, 80333 München, Germany
H
Haoan Ping
School of Computation, Information and Technology, Technical University of Munich, 80333 München, Germany
Y
Yuchen Li
Department of Informatics, Technical University of Munich, Munich, 85748, Germany, and the University Research and Innovation Center, Obuda University, Budapest, H-1034, Hungary
Zhenshan Bing
Zhenshan Bing
Nanjing University / Technical University of Munich
Robotics
F
Fuchun Sun
Department of Computer Science and Technology, Tsinghua University, Beijing 100190, China
Alois Knoll
Alois Knoll
Technische Universität München
RoboticsAISensor Data FusionAutonomous DrivingCyber Physical Systems