🤖 AI Summary
This study investigates whether positional encoding designs in language models align with morphologically complex and syntactically flexible languages, testing the “morphological complexity–word-order flexibility trade-off” hypothesis in architectural selection. We pretrain monolingual Transformer models—equipped with absolute, relative, and no positional encoding—on seven typologically diverse languages and evaluate them systematically across four downstream tasks. This constitutes the first cross-lingual empirical validation of the theoretical trade-off. Results show no statistically significant correlation between positional encoding performance and either morphological complexity or word-order flexibility. Instead, task type, evaluation metric, and language selection critically affect result stability. The findings reveal that mainstream positional encoding mechanisms lack the hypothesized typological adaptivity, challenging the common assumption that architectural design must be customized to linguistic typology.
📝 Abstract
Language model architectures are predominantly first created for English and subsequently applied to other languages. It is an open question whether this architectural bias leads to degraded performance for languages that are structurally different from English. We examine one specific architectural choice: positional encodings, through the lens of the trade-off hypothesis: the supposed interplay between morphological complexity and word order flexibility. This hypothesis posits a trade-off between the two: a more morphologically complex language can have a more flexible word order, and vice-versa. Positional encodings are a direct target to investigate the implications of this hypothesis in relation to language modelling. We pretrain monolingual model variants with absolute, relative, and no positional encodings for seven typologically diverse languages and evaluate them on four downstream tasks. Contrary to previous findings, we do not observe a clear interaction between position encodings and morphological complexity or word order flexibility, as measured by various proxies. Our results show that the choice of tasks, languages, and metrics are essential for drawing stable conclusions