🤖 AI Summary
CNNs’ inherent grid-based architecture struggles to capture complex image topology and non-local semantic relationships. To address this, we propose the Hierarchical Graph Feature Enhancement (HGFE) framework, which constructs a dual-level graph structure—local window graphs and global supernode graphs—to jointly encode local geometric constraints and global semantic correlations. We further introduce an adaptive frequency modulation module that dynamically regulates high- and low-frequency information propagation, effectively mitigating over-smoothing while preserving edge and texture details. All components—including intra-window graph convolution, inter-window supernode interaction, and frequency modulation—are lightweight, end-to-end trainable, and modularly integrable into mainstream CNN backbones without architectural modification. Extensive experiments on diverse benchmarks—including CIFAR-100 (classification), PASCAL VOC (detection), and VisDrone, CrackSeg, and CarParts (segmentation)—demonstrate consistent and significant performance gains across all tasks.
📝 Abstract
Convolutional neural networks (CNNs) have
demonstrated strong performance in visual recognition tasks,
but their inherent reliance on regular grid structures limits
their capacity to model complex topological relationships and
non-local semantics within images. To address this limita tion, we propose the hierarchical graph feature enhancement
(HGFE), a novel framework that integrates graph-based rea soning into CNNs to enhance both structural awareness and
feature representation. HGFE builds two complementary levels
of graph structures: intra-window graph convolution to cap ture local spatial dependencies and inter-window supernode
interactions to model global semantic relationships. Moreover,
we introduce an adaptive frequency modulation module that
dynamically balances low-frequency and high-frequency signal
propagation, preserving critical edge and texture information
while mitigating over-smoothing. The proposed HGFE module
is lightweight, end-to-end trainable, and can be seamlessly
integrated into standard CNN backbone networks. Extensive
experiments on CIFAR-100 (classification), PASCAL VOC,
and VisDrone (detection), as well as CrackSeg and CarParts
(segmentation), validated the effectiveness of the HGFE in
improving structural representation and enhancing overall
recognition performance.