Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners

📅 2025-11-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) employed as graph reasoners lack structural symmetry invariance—outputs often vary under node relabeling, edge reordering, or serialization format changes. Method: We propose a three-dimensional decomposition framework for graph serialization (node labeling, edge encoding, syntactic structure), construct a benchmark covering diverse structural perturbations, and design novel spectral-graph tasks to evaluate cross-task generalization. Using fine-grained serialization control, contrastive experiments, and spectral graph construction techniques, we systematically quantify model sensitivity to structural transformations. Contribution/Results: We find that scaling model size markedly improves robustness; fine-tuning mitigates sensitivity to node relabeling but exacerbates sensitivity to edge ordering and serialization format—and fails to enhance generalization to unseen tasks. This work establishes the first principled evaluation paradigm for structural invariance in graph-language models, providing both theoretical foundations and practical tools for developing trustworthy graph-reasoning LMs.

Technology Category

Application Category

📝 Abstract
While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can produce different outputs under node reindexing, edge reordering, or formatting changes, raising robustness concerns. We systematically analyze these effects, studying how fine-tuning impacts encoding sensitivity as well generalization on unseen tasks. We propose a principled decomposition of graph serializations into node labeling, edge encoding, and syntax, and evaluate LLM robustness to variations of each of these factors on a comprehensive benchmarking suite. We also contribute a novel set of spectral tasks to further assess generalization abilities of fine-tuned reasoners. Results show that larger (non-fine-tuned) models are more robust. Fine-tuning reduces sensitivity to node relabeling but may increase it to variations in structure and format, while it does not consistently improve performance on unseen tasks.
Problem

Research questions and friction points this paper is trying to address.

LLM graph reasoners lack invariance to graph representation symmetries
Graph serialization variations cause inconsistent LLM outputs and robustness issues
Fine-tuning impacts encoding sensitivity and generalization on unseen tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically analyzes LLM invariance to graph serialization variations
Proposes decomposition of graph serializations into three components
Introduces spectral tasks to assess generalization of fine-tuned models
🔎 Similar Papers
No similar papers found.
D
Daniel Herbst
Technical University of Munich
L
Lea Karbeska
University of Cambridge
D
Divyanshu Kumar
Enkrypt AI
Akanksha Ahuja
Akanksha Ahuja
University of Cambridge
F
Fatemeh Gholamzadeh Nasrabadi
University of Amsterdam
Fabrizio Frasca
Fabrizio Frasca
Posdoctoral Fellow, Technion – Israel Institute of Technology
machine learninggraph representation learninggeometric deep learningartificial intelligence