Adaptive Tokenization: On the Hop-Overpriority Problem in Tokenized Graph Learning Models

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

Existing graph tokenization models rely on handcrafted, fixed node orderings, leading to the “hop-priority bias”: overemphasizing local neighborhood signals while undermining global structural modeling and adaptability to heterophilous graphs. This work formally characterizes this bias for the first time and proposes the Learnable Graph Token List (LGTL) module—a plug-and-play, end-to-end trainable token generation mechanism. LGTL integrates graph attention gating, learnable cross-hop weight allocation, and importance-driven intra-hop node selection, with theoretical guarantees for bias correction. By unifying representation learning across homophilous and heterophilous graphs, LGTL consistently enhances Graph Transformer and Graph LLM performance across multiple benchmarks. On heterophilous graph tasks, it achieves average accuracy gains of 3.2–7.8%, demonstrating strong generalization and robustness.

Technology Category

Application Category

📝 Abstract

Graph Transformers, leveraging the global attention to capture long-range dependencies in graph structures, have significantly advanced graph machine learning, but face prohibitive computational complexity. Tokenized Graph Learning Models (TGLMs) address this issue by converting graphs into ordered token lists for scalable processing. Besides, TGLMs also empower Large Language Models (LLMs) to handle text-attributed graphs more effectively and thus are also employed in Graph LLMs. However, existing TGLMs rely on hand-designed token lists and their adaptability to diverse graph learning scenarios remains unexplored. In this paper, we first conduct extensive empirical and theoretical preliminary studies for hand-designed token lists. Surprisingly, we identify an unexplored hop-overpriority problem: the common pre-defined token lists overemphasize nearby nodes and overwhelm the ability of TGLMs to balance local and global signals. This phenomenon is especially harmful for heterophilic graphs. To address this problem, we propose the Learnable Graph Token List (LGTL), a plug-and-play module to replace hand-designed token lists in TGLMs. Specifically, LGTL adaptively adjusts the weights across hops and prioritizes informative nodes within hops through a graph attention gate module and a selection module, respectively. In this way, contextually informative nodes can be adaptively emphasized for both homophilic and heterophilic graphs. Besides, we theoretically show that LGTL can address the hop-overpriority problem. Extensive experiments on benchmarks validate the efficacy of LGTL across both Graph Transformers and Graph LLM backbones.

Problem

Research questions and friction points this paper is trying to address.

TGLMs overemphasize nearby nodes, harming heterophilic graphs.

Hand-designed token lists lack adaptability in graph learning.

Existing token lists fail to balance local and global signals.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes Learnable Graph Token List (LGTL)

Adaptively adjusts weights across hops

Prioritizes informative nodes within hops

🔎 Similar Papers

HIGHT: Hierarchical Graph Tokenization for Graph-Language Alignment