๐ค AI Summary
This work proposes a novel approach to enhance the syntactic generalization capabilities of language models without introducing additional structural tokens. By dynamically injecting semantic dependency graph features into the Transformerโs attention mechanism, the method modulates attention weights to implicitly integrate syntactic information into language modeling. Crucially, the dependency graph is constructed incrementally and applied directly to attention modulation without altering the input sequence. This strategy preserves perplexity comparable to that of baseline models while substantially improving syntactic generalization. Moreover, fine-tuning with this approach yields superior performance on downstream tasks, demonstrating its effectiveness in leveraging syntactic structure for better linguistic generalization.
๐ Abstract
Augmenting Transformers with linguistic structures effectively enhances the syntactic generalization performance of language models. Previous work in this direction focuses on syntactic tree structures of languages, in particular constituency tree structures. We propose Graph-Infused Layers Transformer Language Model (GiLT) which leverages dependency graphs for augmenting Transformer language models. Unlike most previous work, GiLT does not insert extra structural tokens in language modeling; instead, it injects structural information into language modeling by modulating attention weights in the Transformer with features extracted from the dependency graph that is incrementally constructed along with token prediction. In our experiments, GiLT with semantic dependency graphs achieves better syntactic generalization while maintaining competitive perplexity in comparison with Transformer language model baselines. In addition, GiLT can be finetuned from a pretrained language model to achieve improved downstream task performance. Our code is released at https://github.com/cookie-pie-oops/GiLT-LM.