Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language models (LLMs) employed as graph reasoners lack structural symmetry invariance—outputs often vary under node relabeling, edge reordering, or serialization format changes. Method: We propose a three-dimensional decomposition framework for graph serialization (node labeling, edge encoding, syntactic structure), construct a benchmark covering diverse structural perturbations, and design novel spectral-graph tasks to evaluate cross-task generalization. Using fine-grained serialization control, contrastive experiments, and spectral graph construction techniques, we systematically quantify model sensitivity to structural transformations. Contribution/Results: We find that scaling model size markedly improves robustness; fine-tuning mitigates sensitivity to node relabeling but exacerbates sensitivity to edge ordering and serialization format—and fails to enhance generalization to unseen tasks. This work establishes the first principled evaluation paradigm for structural invariance in graph-language models, providing both theoretical foundations and practical tools for developing trustworthy graph-reasoning LMs.

Technology Category

Application Category

📝 Abstract

While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can produce different outputs under node reindexing, edge reordering, or formatting changes, raising robustness concerns. We systematically analyze these effects, studying how fine-tuning impacts encoding sensitivity as well generalization on unseen tasks. We propose a principled decomposition of graph serializations into node labeling, edge encoding, and syntax, and evaluate LLM robustness to variations of each of these factors on a comprehensive benchmarking suite. We also contribute a novel set of spectral tasks to further assess generalization abilities of fine-tuned reasoners. Results show that larger (non-fine-tuned) models are more robust. Fine-tuning reduces sensitivity to node relabeling but may increase it to variations in structure and format, while it does not consistently improve performance on unseen tasks.

Problem

Research questions and friction points this paper is trying to address.

LLM graph reasoners lack invariance to graph representation symmetries

Graph serialization variations cause inconsistent LLM outputs and robustness issues

Fine-tuning impacts encoding sensitivity and generalization on unseen tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematically analyzes LLM invariance to graph serialization variations

Proposes decomposition of graph serializations into three components

Introduces spectral tasks to assess generalization of fine-tuned models

🔎 Similar Papers

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path