DVHGNN: Multi-Scale Dilated Vision HGNN for Efficient Vision Recognition

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Graph Neural Networks (ViG) suffer from quadratic computational complexity and limited capacity to model higher-order semantic dependencies due to their pairwise graph structures. To address these issues, this paper proposes the Multi-Scale Dilated Hypergraph Neural Network (DVHGNN). Our approach introduces three key innovations: (1) an Adaptive Clustering-based Dilated Hypergraph Construction (DHGC) mechanism that significantly reduces KNN-based graph construction overhead; (2) dynamic hypergraph convolution enabling cross-scale, higher-order relational modeling; and (3) multi-scale feature fusion to enhance representation capability. Evaluated on ImageNet-1K, DVHGNN-S achieves 83.1% top-1 accuracy—outperforming ViG-S and ViHGNN-S by 1.0% and 0.6%, respectively—while attaining a superior trade-off between accuracy and computational efficiency.

Technology Category

Application Category

📝 Abstract
Recently, Vision Graph Neural Network (ViG) has gained considerable attention in computer vision. Despite its groundbreaking innovation, Vision Graph Neural Network encounters key issues including the quadratic computational complexity caused by its K-Nearest Neighbor (KNN) graph construction and the limitation of pairwise relations of normal graphs. To address the aforementioned challenges, we propose a novel vision architecture, termed Dilated Vision HyperGraph Neural Network (DVHGNN), which is designed to leverage multi-scale hypergraph to efficiently capture high-order correlations among objects. Specifically, the proposed method tailors Clustering and Dilated HyperGraph Construction (DHGC) to adaptively capture multi-scale dependencies among the data samples. Furthermore, a dynamic hypergraph convolution mechanism is proposed to facilitate adaptive feature exchange and fusion at the hypergraph level. Extensive qualitative and quantitative evaluations of the benchmark image datasets demonstrate that the proposed DVHGNN significantly outperforms the state-of-the-art vision backbones. For instance, our DVHGNN-S achieves an impressive top-1 accuracy of 83.1% on ImageNet-1K, surpassing ViG-S by +1.0% and ViHGNN-S by +0.6%.
Problem

Research questions and friction points this paper is trying to address.

Addresses quadratic computational complexity in KNN graph construction.
Overcomes limitations of pairwise relations in normal graphs.
Captures high-order correlations efficiently using multi-scale hypergraphs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-scale hypergraph captures high-order correlations
Clustering and Dilated HyperGraph Construction adaptively
Dynamic hypergraph convolution for feature fusion
🔎 Similar Papers
No similar papers found.