Gated Tree Cross-attention for Checkpoint-Compatible Syntax Injection in Decoder-Only LLMs

📅 2026-01-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work addresses the sensitivity of existing decoder-only large language models to syntactic perturbations, noting that directly injecting syntactic structures often compromises their pre-trained capabilities. To overcome this limitation, the authors propose Gated Tree Cross-Attention (GTCA), a plug-and-play auxiliary module compatible with existing model checkpoints without requiring modifications to the backbone architecture. GTCA leverages precomputed constituent syntactic chunks as external memory and employs token-update masking together with a staged training strategy to precisely control the timing and scope of syntactic information injection. Experimental results demonstrate that GTCA significantly enhances syntactic robustness across various Transformer backbones and benchmarks while preserving performance on multiple-choice question answering and commonsense reasoning tasks.

Technology Category

Application Category

📝 Abstract
Decoder-only large language models achieve strong broad performance but are brittle to minor grammatical perturbations, undermining reliability for downstream reasoning. However, directly injecting explicit syntactic structure into an existing checkpoint can interfere with its pretrained competence. We introduce a checkpoint-compatible gated tree cross-attention (GTCA) branch that reads precomputed constituency chunk memory while leaving backbone architecture unchanged. Our design uses a token update mask and staged training to control the scope and timing of structural updates. Across benchmarks and Transformer backbones, GTCA strengthens syntactic robustness beyond continued-training baselines without compromising Multiple-Choice QA performance or commonsense reasoning, providing a practical checkpoint-compatible route to more syntax-robust decoder-only LLMs.
Problem

Research questions and friction points this paper is trying to address.

syntactic robustness
decoder-only LLMs
grammar perturbations
checkpoint compatibility
syntax injection
Innovation

Methods, ideas, or system contributions that make the work stand out.

gated tree cross-attention
syntax injection
checkpoint-compatible
decoder-only LLMs
syntactic robustness