Modality-Specific Hierarchical Enhancement for RGB-D Camouflaged Object Detection

📅 2026-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RGB-D camouflaged object detection methods inadequately exploit modality-specific cues, limiting the effectiveness of feature fusion. To address this, this work proposes MHENet, a novel framework that incorporates a Texture Hierarchical Enhancement Module (THEM) to capture high-frequency details from RGB images and a Geometry Hierarchical Enhancement Module (GHEM) to model structural information from depth gradients. Furthermore, an Adaptive Dynamic Fusion Module (ADFM) is introduced to enable spatially varying weight fusion, enhancing modality complementarity while preserving cross-scale semantic consistency. Extensive experiments demonstrate that MHENet significantly outperforms 16 state-of-the-art methods across four benchmark datasets, achieving leading performance in both qualitative and quantitative evaluations.
📝 Abstract
Camouflaged object detection (COD) is challenging due to high target-background similarity, and recent methods address this by complementarily using RGB-D texture and geometry cues. However, RGB-D COD methods still underutilize modality-specific cues, which limits fusion quality. We believe this is because RGB and depth features are fused directly after backbone extraction without modality-specific enhancement. To address this limitation, we propose MHENet, an RGB-D COD framework that performs modality-specific hierarchical enhancement and adaptive fusion of RGB and depth features. Specifically, we introduce a Texture Hierarchical Enhancement Module (THEM) to amplify subtle texture variations by extracting high-frequency information and a Geometry Hierarchical Enhancement Module (GHEM) to enhance geometric structures via learnable gradient extraction, while preserving cross-scale semantic consistency. Finally, an Adaptive Dynamic Fusion Module (ADFM) adaptively fuses the enhanced texture and geometry features with spatially varying weights. Experiments on four benchmarks demonstrate that MHENet surpasses 16 state-of-the-art methods qualitatively and quantitatively. Code is available at https://github.com/afdsgh/MHENet.
Problem

Research questions and friction points this paper is trying to address.

Camouflaged Object Detection
RGB-D
Modality-Specific Cues
Feature Fusion
Hierarchical Enhancement
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-specific enhancement
hierarchical feature enhancement
adaptive fusion
RGB-D camouflaged object detection
learnable gradient extraction
🔎 Similar Papers
No similar papers found.
Yuzhen Niu
Yuzhen Niu
Fuzhou University
Computer GraphicsComputer VisionMultimediaand Human Computer Interaction
Y
Yangqing Wang
College of Computer and Data Science, Fuzhou University, Fuzhou, China
Ri Cheng
Ri Cheng
Fuzhou University
Low-level Vision
F
Fusheng Li
College of Computer and Data Science, Fuzhou University, Fuzhou, China
R
Rongshen Wang
College of Computer and Data Science, Fuzhou University, Fuzhou, China
Z
Zhichen Yang
College of Computer and Data Science, Fuzhou University, Fuzhou, China