Self-Directed Learning of Convex Labelings on Graphs

📅 2024-09-02

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This paper investigates active learning for node classification on graphs: given only the graph structure, a learner adaptively selects nodes to query in order to efficiently identify (geodesically) convex label clusters—i.e., subsets of nodes with identical labels such that all nodes on every shortest path between any two cluster members also share that label. We establish the first theoretical framework for active learning on graphs under geodesic convexity. Our main contribution is a tight mistake bound of $O((h(G)+1)^4 log n)$, where $h(G)$ is the Hadwiger number of the graph, and a polynomial-time algorithm achieving this bound. We further extend the framework robustly to approximately convex and homogeneous clusters, retaining logarithmic mistake rates. The core innovation lies in the deep integration of graph-theoretic convexity, the Hadwiger number, and adaptive querying strategies—thereby filling a fundamental theoretical gap in active learning for graph-structured data.

Technology Category

Application Category

📝 Abstract

We study the problem of classifying the nodes of a given graph in the self-directed learning setup. This learning setting is a variant of online learning, where rather than an adversary determining the sequence in which nodes are presented, the learner autonomously and adaptively selects them. While self-directed learning of Euclidean halfspaces, linear functions, and general multiclass hypothesis classes was recently considered, no results previously existed specifically for self-directed node classification on graphs. In this paper, we address this problem developing efficient algorithms for it. More specifically, we focus on the case of (geodesically) convex clusters, i.e., for every two nodes sharing the same label, all nodes on every shortest path between them also share the same label. In particular, we devise an algorithm with runtime polynomial in $n$ that makes only $3(h(G)+1)^4 ln n$ mistakes on graphs with two convex clusters, where $n$ is the total number of nodes and $h(G)$ is the Hadwiger number, i.e., the size of the largest clique minor of the graph $G$. We also show that our algorithm is robust to the case that clusters are slightly non-convex, still achieving a mistake bound logarithmic in $n$. Finally, we devise a simple and efficient algorithm for homophilic clusters, where strongly connected nodes tend to belong to the same class.

Problem

Research questions and friction points this paper is trying to address.

Self-directed learning for graph node classification

Efficient algorithms for convex label clusters

Robustness to slightly non-convex clusters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-directed node classification

Polynomial runtime algorithm

Convex cluster labeling

🔎 Similar Papers

Refined Graph Encoder Embedding via Self-Training and Latent Community Recovery

2024-05-21arXiv.orgCitations: 2

Bosch Group

Stuttgart, BW, DE

Software Engineer, Machine Learning