Do Transformers Use their Depth Adaptively? Evidence from a Relational Reasoning Task

πŸ“… 2026-04-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

176K/year
πŸ€– AI Summary
This study investigates whether Transformer models can adaptively leverage network depth for reasoning according to task difficulty. By designing a controlled multi-hop kinship reasoning task and employing logit lens analysis alongside causal patching to trace inter-layer predictions and cross-token information flow, the work systematically reveals how models modulate depth usage across varying difficulty levels. The findings demonstrate that larger pretrained models can produce reasonable answers to simple tasks using only shallow layers, whereas fine-tuned models exhibit stronger and more consistent depth-adaptive behaviorβ€”most prominently when fine-tuned without preserving the general language modeling objective. This work provides the first empirical evidence of adaptive depth utilization in Transformers within a controlled reasoning setting.

Technology Category

Application Category

πŸ“ Abstract
We investigate whether transformers use their depth adaptively across tasks of increasing difficulty. Using a controlled multi-hop relational reasoning task based on family stories, where difficulty is determined by the number of relationship hops that must be composed, we monitor (i) how predictions evolve across layers via early readouts (the logit lens) and (ii) how task-relevant information is integrated across tokens via causal patching. For pretrained models, we find some limited evidence for adaptive depth use: some larger models need fewer layers to arrive at plausible answers for easier tasks, and models generally use more layers to integrate information across tokens as chain length increases. For models finetuned on the task, we find clearer and more consistent evidence of adaptive depth use, with the effect being stronger for less constrained finetuning regimes that do not preserve general language modeling abilities.
Problem

Research questions and friction points this paper is trying to address.

Transformers
adaptive depth
relational reasoning
multi-hop reasoning
task difficulty
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive depth
relational reasoning
logit lens
causal patching
transformer analysis