Scaling GraphLLM with Bilevel-Optimized Sparse Querying

📅 2026-01-30

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the high computational and economic costs incurred by frequent large language model (LLM) invocations in node-level tasks on large-scale graphs. To mitigate this, the authors propose BOSQ, a novel framework that introduces a bilevel optimization mechanism to enable adaptive sparse querying. BOSQ employs a dynamic decision policy jointly learned by a graph neural network and an LLM, selectively invoking the LLM only for high-value nodes to generate interpretable features. Extensive experiments on six real-world text-attributed graph datasets demonstrate that BOSQ drastically reduces the number of LLM calls while achieving performance on par with or superior to existing GraphLLM methods in both node classification and link prediction tasks, substantially improving overall efficiency.

📝 Abstract

LLMs have recently shown strong potential in enhancing node-level tasks on text-attributed graphs (TAGs) by providing explanation features. However, their practical use is severely limited by the high computational and monetary cost of repeated LLM queries. To illustrate, naively generating explanations for all nodes on a medium-sized benchmark like Photo (48k nodes) using a representative method (e.g., TAPE) would consume days of processing time. In this paper, we propose Bilevel-Optimized Sparse Querying (BOSQ), a general framework that selectively leverages LLM-derived explanation features to enhance performance on node-level tasks on TAGs. We design an adaptive sparse querying strategy that selectively decides when to invoke LLMs, avoiding redundant or low-gain queries and significantly reducing computation overhead. Extensive experiments on six real-world TAG datasets involving two types of node-level tasks demonstrate that BOSQ achieves orders of magnitude speedups over existing GraphLLM methods while consistently delivering on-par or superior performance.

Problem

Research questions and friction points this paper is trying to address.

GraphLLM

text-attributed graphs

node-level tasks

computational cost

LLM querying

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bilevel Optimization

Sparse Querying

GraphLLM