AdaptViG: Adaptive Vision GNN with Exponential Decay Gating

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision Graph Neural Networks (ViGs) suffer from high computational overhead due to costly graph construction, limiting their efficiency. To address this, we propose AdaptViG—a highly efficient hybrid vision graph network. Our method introduces three key innovations: (1) adaptive graph convolution that jointly leverages a static axial skeleton and dynamic, content-aware gating; (2) an exponential decay gating mechanism that sparsifies long-range connections based on feature similarity; and (3) a staged information aggregation strategy combining early local gating with late global attention. Evaluated on ImageNet, AdaptViG-M achieves 82.6% top-1 accuracy—surpassing larger models on downstream tasks—while reducing parameters by 80% and GMACs by 84%. These improvements significantly enhance the accuracy–efficiency trade-off, establishing a new state-of-the-art in efficient vision graph modeling.

Technology Category

Application Category

📝 Abstract
Vision Graph Neural Networks (ViGs) offer a new direction for advancements in vision architectures. While powerful, ViGs often face substantial computational challenges stemming from their graph construction phase, which can hinder their efficiency. To address this issue we propose AdaptViG, an efficient and powerful hybrid Vision GNN that introduces a novel graph construction mechanism called Adaptive Graph Convolution. This mechanism builds upon a highly efficient static axial scaffold and a dynamic, content-aware gating strategy called Exponential Decay Gating. This gating mechanism selectively weighs long-range connections based on feature similarity. Furthermore, AdaptViG employs a hybrid strategy, utilizing our efficient gating mechanism in the early stages and a full Global Attention block in the final stage for maximum feature aggregation. Our method achieves a new state-of-the-art trade-off between accuracy and efficiency among Vision GNNs. For instance, our AdaptViG-M achieves 82.6% top-1 accuracy, outperforming ViG-B by 0.3% while using 80% fewer parameters and 84% fewer GMACs. On downstream tasks, AdaptViG-M obtains 45.8 mIoU, 44.8 APbox, and 41.1 APmask, surpassing the much larger EfficientFormer-L7 by 0.7 mIoU, 2.2 APbox, and 2.1 APmask, respectively, with 78% fewer parameters.
Problem

Research questions and friction points this paper is trying to address.

ViGs face computational challenges from graph construction phase
Need efficient hybrid Vision GNN with adaptive graph mechanism
Require better accuracy-efficiency trade-off for vision tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive Graph Convolution for efficient graph construction
Exponential Decay Gating for dynamic feature weighting
Hybrid strategy combining gating and global attention
🔎 Similar Papers
No similar papers found.
Mustafa Munir
Mustafa Munir
The University of Texas at Austin
Machine LearningComputer VisionGenerative AISuperconducting ElectronicsNeurosymbolic AI
M
Md Mostafijur Rahman
The University of Texas at Austin
R
R. Marculescu
The University of Texas at Austin