DualGazeNet: A Biologically Inspired Dual-Gaze Query Network for Salient Object Detection

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing salient object detection (SOD) methods rely on complex multi-stage architectures, suffering from feature redundancy and cross-module interference, which hinder performance. In contrast, the human visual system achieves efficient saliency perception through elegant, biologically grounded mechanisms. To bridge this gap, we propose DualGazeNet—the first pure Transformer-based SOD model inspired by the dual visual pathways (magnocellular and parvocellular) in biological vision. DualGazeNet eliminates dedicated fusion modules and multi-stage designs, instead achieving multi-scale feature integration and precise boundary localization via dual attention queries and cortical attention modulation. Evaluated on five mainstream RGB benchmarks, it outperforms 25 state-of-the-art methods, achieves a 60% inference speedup, reduces FLOPs by 53.4%, and demonstrates strong cross-domain generalization—e.g., in camouflaged and underwater scenes. DualGazeNet thus delivers superior accuracy, efficiency, and interpretability in a unified, biologically plausible framework.

Technology Category

Application Category

📝 Abstract

Recent salient object detection (SOD) methods aim to improve performance in four key directions: semantic enhancement, boundary refinement, auxiliary task supervision, and multi-modal fusion. In pursuit of continuous gains, these approaches have evolved toward increasingly sophisticated architectures with multi-stage pipelines, specialized fusion modules, edge-guided learning, and elaborate attention mechanisms. However, this complexity paradoxically introduces feature redundancy and cross-component interference that obscure salient cues, ultimately reaching performance bottlenecks. In contrast, human vision achieves efficient salient object identification without such architectural complexity. This contrast raises a fundamental question: can we design a biologically grounded yet architecturally simple SOD framework that dispenses with most of this engineering complexity, while achieving state-of-the-art accuracy, computational efficiency, and interpretability? In this work, we answer this question affirmatively by introducing DualGazeNet, a biologically inspired pure Transformer framework that models the dual biological principles of robust representation learning and magnocellular-parvocellular dual-pathway processing with cortical attention modulation in the human visual system. Extensive experiments on five RGB SOD benchmarks show that DualGazeNet consistently surpasses 25 state-of-the-art CNN- and Transformer-based methods. On average, DualGazeNet achieves about 60% higher inference speed and 53.4% fewer FLOPs than four Transformer-based baselines of similar capacity (VST++, MDSAM, Sam2unet, and BiRefNet). Moreover, DualGazeNet exhibits strong cross-domain generalization, achieving leading or highly competitive performance on camouflaged and underwater SOD benchmarks without relying on additional modalities.

Problem

Research questions and friction points this paper is trying to address.

Addresses performance bottlenecks in salient object detection from architectural complexity

Models human visual dual-pathway processing for efficient object identification

Achieves state-of-the-art accuracy with reduced computational complexity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Biologically inspired pure Transformer framework

Models dual biological principles with cortical attention

Achieves high speed with reduced computational complexity

🔎 Similar Papers

No similar papers found.

Authors to Follow