Context-Free Recognition with Transformers

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

This work investigates whether Transformer models can overcome their theoretical limitations in formal language processing to effectively recognize context-free languages (CFLs). We propose a recurrent Transformer architecture that, with only O(log n) layers and the use of padding tokens (up to O(n⁶) in number), is theoretically capable of recognizing all CFLs. For unambiguous CFLs, we further reduce the padding complexity to O(n³). Our study provides the first proof that recurrent Transformers can fully recognize CFLs, highlighting the critical role of grammatical unambiguity in computational efficiency. Empirical results corroborate the effectiveness of the proposed mechanism on tasks requiring logarithmic-depth computation.

Technology Category

Application Category

📝 Abstract

Transformers excel empirically on tasks that process well-formed inputs according to some grammar, such as natural language and code. However, it remains unclear how they can process grammatical syntax. In fact, under standard complexity conjectures, standard transformers cannot recognize context-free languages (CFLs), a canonical formalism to describe syntax, or even regular languages, a subclass of CFLs. Past work proves that $\mathcal{O}(\log(n))$ looping layers (w.r.t. input length n) allows transformers to recognize regular languages, but the question of context-free recognition remained open. In this work, we show that looped transformers with $\mathcal{O}(\log(n))$ looping layers and $\mathcal{O}(n^6)$ padding tokens can recognize all CFLs. However, training and inference with $\mathcal{O}(n^6)$ padding tokens is potentially impractical. Fortunately, we show that, for natural subclasses such as unambiguous CFLs, the recognition problem on transformers becomes more tractable, requiring $\mathcal{O}(n^3)$ padding. We empirically validate our results and show that looping helps on a language that provably requires logarithmic depth. Overall, our results shed light on the intricacy of CFL recognition by transformers: While general recognition may require an intractable amount of padding, natural constraints such as unambiguity yield efficient recognition algorithms.

Problem

Research questions and friction points this paper is trying to address.

context-free languages

transformers

language recognition

computational complexity

unambiguous CFLs

Innovation

Methods, ideas, or system contributions that make the work stand out.

looped transformers

context-free languages

padding tokens