ViG-LRGC: Vision Graph Neural Networks with Learnable Reparameterized Graph Construction

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the reliance of Vision Graph Neural Networks (ViGs) on hand-crafted hyperparameters or fixed rules for graph construction. To overcome this limitation, we propose Learnable Reparameterized Graph Construction (LRGC), which models images as dynamic node graphs. At each layer, node relationships are computed via key-query attention, and soft-threshold-based reparameterization enables end-to-end differentiable edge selection—automatically learning optimal connection thresholds while avoiding biases introduced by clustering or hard thresholding. LRGC requires no hyperparameter tuning, achieving both structural adaptivity and full differentiability during training. Evaluated on ImageNet-1K, LRGC significantly outperforms state-of-the-art ViG models of comparable capacity, demonstrating superior effectiveness and generalization in image representation learning and classification tasks.

Technology Category

Application Category

📝 Abstract

Image Representation Learning is an important problem in Computer Vision. Traditionally, images were processed as grids, using Convolutional Neural Networks or as a sequence of visual tokens, using Vision Transformers. Recently, Vision Graph Neural Networks (ViG) have proposed the treatment of images as a graph of nodes; which provides a more intuitive image representation. The challenge is to construct a graph of nodes in each layer that best represents the relations between nodes and does not need a hyper-parameter search. ViG models in the literature depend on non-parameterized and non-learnable statistical methods that operate on the latent features of nodes to create a graph. This might not select the best neighborhood for each node. Starting from k-NN graph construction to HyperGraph Construction and Similarity-Thresholded graph construction, these methods lack the ability to provide a learnable hyper-parameter-free graph construction method. To overcome those challenges, we present the Learnable Reparameterized Graph Construction (LRGC) for Vision Graph Neural Networks. LRGC applies key-query attention between every pair of nodes; then uses soft-threshold reparameterization for edge selection, which allows the use of a differentiable mathematical model for training. Using learnable parameters to select the neighborhood removes the bias that is induced by any clustering or thresholding methods previously introduced in the literature. In addition, LRGC allows tuning the threshold in each layer to the training data since the thresholds are learnable through training and are not provided as hyper-parameters to the model. We demonstrate that the proposed ViG-LRGC approach outperforms state-of-the-art ViG models of similar sizes on the ImageNet-1k benchmark dataset.

Problem

Research questions and friction points this paper is trying to address.

Developing learnable graph construction for Vision GNNs without hyper-parameters

Overcoming non-learnable statistical methods for node neighborhood selection

Enabling differentiable edge selection through soft-threshold reparameterization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learnable Reparameterized Graph Construction for Vision GNNs

Uses key-query attention between all node pairs

Employs soft-threshold reparameterization for edge selection

🔎 Similar Papers

No similar papers found.