Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

This work investigates the depth-width trade-off in Transformers for graph algorithmic tasks, addressing the central question: “Can constant-depth inference be achieved with linear width?” Leveraging circuit complexity analysis, formal attention mechanism modeling, constructive architecture design, and rigorous formal verification, we establish the first proof that linear-width Transformers can exactly solve fundamental graph problems—including connectivity and shortest path—in constant depth. Crucially, we uncover a non-monotonic width-depth relationship: certain graph tasks provably require quadratic width to admit constant-depth solutions. Empirical evaluation confirms that our theory-guided depth-compression framework achieves zero-accuracy loss while substantially accelerating inference—demonstrating both theoretical necessity and practical efficacy of depth reduction under linear-width constraints.

Technology Category

Application Category

📝 Abstract

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement a task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We support our theoretical results with empirical evaluations.

Problem

Research questions and friction points this paper is trying to address.

Explores minimal transformer size for graph-based tasks.

Analyzes depth-width tradeoffs with linear width growth.

Determines width requirements for various graph problems.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformers solve graph tasks efficiently.

Linear width enables constant depth models.

Quadratic width needed for some problems.

🔎 Similar Papers

Disentangling and Integrating Relational and Sensory Information in Transformer Architectures