Graph Semi-Supervised Learning for Point Classification on Data Manifolds

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This paper addresses semi-supervised point classification on data manifolds. Methodologically, it proposes a unified “data → manifold → graph” framework: (i) a variational autoencoder (VAE) learns a low-dimensional manifold embedding; (ii) a Gaussian-weighted graph is constructed based on geodesic distances on the learned manifold; and (iii) point classification is reformulated as a semi-supervised node classification task solved by graph neural networks (GNNs). Theoretically, it establishes the first statistical generalization analysis for this pipeline, proving that the generalization error decays with increasing graph size. Algorithmically, it introduces a dynamic resampling training mechanism that asymptotically eliminates the generalization gap. Empirically, extensive evaluation on image classification benchmarks demonstrates that the synergy between large-scale graph construction and dynamic resampling significantly improves generalization performance—approaching the Bayesian optimal error bound.

Technology Category

Application Category

📝 Abstract

We propose a graph semi-supervised learning framework for classification tasks on data manifolds. Motivated by the manifold hypothesis, we model data as points sampled from a low-dimensional manifold $mathcal{M} subset mathbb{R}^F$. The manifold is approximated in an unsupervised manner using a variational autoencoder (VAE), where the trained encoder maps data to embeddings that represent their coordinates in $mathbb{R}^F$. A geometric graph is constructed with Gaussian-weighted edges inversely proportional to distances in the embedding space, transforming the point classification problem into a semi-supervised node classification task on the graph. This task is solved using a graph neural network (GNN). Our main contribution is a theoretical analysis of the statistical generalization properties of this data-to-manifold-to-graph pipeline. We show that, under uniform sampling from $mathcal{M}$, the generalization gap of the semi-supervised task diminishes with increasing graph size, up to the GNN training error. Leveraging a training procedure which resamples a slightly larger graph at regular intervals during training, we then show that the generalization gap can be reduced even further, vanishing asymptotically. Finally, we validate our findings with numerical experiments on image classification benchmarks, demonstrating the empirical effectiveness of our approach.

Problem

Research questions and friction points this paper is trying to address.

Classify points on data manifolds using graph semi-supervised learning

Analyze generalization of manifold-to-graph pipeline for classification

Improve classification accuracy via resampling and graph neural networks

Innovation

Methods, ideas, or system contributions that make the work stand out.

VAE for unsupervised manifold approximation

Geometric graph with Gaussian-weighted edges

GNN for semi-supervised node classification

🔎 Similar Papers

Interface Laplace Learning: Learnable Interface Term Helps Semi-Supervised Learning