π€ AI Summary
This work addresses the challenges of shallow text encoding and heavy reliance on labeled data in node classification on textual attributed graphs under label-scarce settings. By reformulating node classification as a link prediction task and leveraging graph homophily, the authors propose a self-supervised framework that constructs hierarchical preference data from unlabeled graph structures. For the first time, large language models (LLMs) are integrated with graph structure for self-supervised preference learning without any annotated labels. The approach introduces hop-based sampling to build hierarchical preferences and an adaptive early-exit voting mechanism for classification. Evaluated on three benchmark datasets, the method achieves training without any labels while matching the performance of fully supervised graph neural networks and significantly outperforming existing graphβLLM approaches.
π Abstract
Node classification on text-attributed graphs (TAGs) is a fundamental task with broad applications in citation analysis, social networks, and recommendation systems. Current GNN-based approaches suffer from shallow text encoding and heavy dependence on labeled data, limiting their effectiveness in label-scarce settings. While large language models (LLMs) naturally address the text understanding gap with deep semantic reasoning, existing LLM-for-graph methods either still require abundant labels during training or fail to exploit the rich structural signals freely available in graph topology. Our key observation is that, in many real-world TAGs, edges predominantly connect similar nodes under the homophily principle, meaning graph topology inherently encodes class structure without any labels. Building on this insight, we reformulate node classification as a link prediction task and present HopRank, a fully self-supervised LLM-tuning framework for TAGs. HopRank constructs preference data via hierarchical hop-based sampling and employs adaptive preference learning to prioritize informative training signals without any class labels. At inference, nodes are classified by predicting their connection preferences to labeled anchors, with an adaptive early-exit voting scheme to improve efficiency. Experiments on three TAG benchmarks show that HopRank matches fully-supervised GNNs and substantially outperforms prior graph-LLM methods, despite using zero labeled training data.