Transformers Can Learn Connectivity in Some Graphs but Not Others

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

This study investigates the capacity of Transformer-based large language models (LLMs) to learn transitive relations—i.e., connectivity reasoning—over directed graphs, and how this capacity scales with model size. Method: We train and evaluate Transformers of varying parameter counts on synthetically generated directed graph datasets encompassing diverse topologies: 2D grids, high-dimensional lattices, and disconnected structures. Contribution/Results: We find that graph dimensionality and topological properties—not merely dataset size or model capacity—are decisive for generalization: low-dimensional grid graphs are learned effectively, whereas high-dimensional or highly fragmented graphs severely degrade performance. Crucially, increasing model scale improves generalization only on grid-like structures and fails to overcome fundamental limitations imposed by structural complexity. This work provides the first systematic characterization of the structural boundaries of Transformer inductive bias in modeling graph transitivity, offering critical empirical evidence for understanding their causal reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Reasoning capability is essential to ensure the factual correctness of the responses of transformer-based Large Language Models (LLMs), and robust reasoning about transitive relations is instrumental in many settings, such as causal inference. Hence, it is essential to investigate the capability of transformers in the task of inferring transitive relations (e.g., knowing A causes B and B causes C, then A causes C). The task of inferring transitive relations is equivalent to the task of connectivity in directed graphs (e.g., knowing there is a path from A to B, and there is a path from B to C, then there is a path from A to C). Past research focused on whether transformers can learn to infer transitivity from in-context examples provided in the input prompt. However, transformers' capability to infer transitive relations from training examples and how scaling affects the ability is unexplored. In this study, we seek to answer this question by generating directed graphs to train transformer models of varying sizes and evaluate their ability to infer transitive relations for various graph sizes. Our findings suggest that transformers are capable of learning connectivity on "grid-like'' directed graphs where each node can be embedded in a low-dimensional subspace, and connectivity is easily inferable from the embeddings of the nodes. We find that the dimensionality of the underlying grid graph is a strong predictor of transformers' ability to learn the connectivity task, where higher-dimensional grid graphs pose a greater challenge than low-dimensional grid graphs. In addition, we observe that increasing the model scale leads to increasingly better generalization to infer connectivity over grid graphs. However, if the graph is not a grid graph and contains many disconnected components, transformers struggle to learn the connectivity task, especially when the number of components is large.

Problem

Research questions and friction points this paper is trying to address.

Investigating transformers' ability to learn transitive relations from training examples

Evaluating how model scaling affects connectivity inference in directed graphs

Analyzing transformers' performance on grid-like versus disconnected graph structures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers learn connectivity in grid-like directed graphs

Model scale improves generalization for graph connectivity

High-dimensional grid graphs challenge transformer learning ability

🔎 Similar Papers

Graph Transformers: A Survey