đ€ AI Summary
Emerging non-Transformer architecturesâincluding xLSTM, structured state space models (SSMs), diffusion models, and adversarial learning frameworksâface challenges in multilingual, hierarchical sequence labeling tasks characterized by complex label topologies (e.g., nested and discontinuous entities), strong token-level dependencies, and poor cross-lingual transferability, especially in low-resource languages.
Method: We establish a unified multitask benchmark to systematically evaluate the adaptability and generalization of these architectures across diverse languages and labeling granularities, enabling the first large-scale, cross-lingual, cross-task comparative analysis.
Contribution/Results: While several non-Transformer models match Transformer performance on simple flat-labeling tasks, they exhibit substantial degradation on structurally complex annotations and low-resource languages, revealing intrinsic architectural limitations in modeling long-range, hierarchical, and cross-lingual dependencies. Our empirical analysis identifies critical structural bottlenecksâparticularly in dependency capture and compositional generalizationâproviding foundational evidence and concrete directions for next-generation sequence modeling.
đ Abstract
Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language modeling, few have been applied to sequence labeling, and mostly on flat or simplified tasks. We study how these architectures adapt across tagging tasks that vary in structural complexity, label space, and token dependencies, with evaluation spanning multiple languages. We find that the strong performance previously observed in simpler settings does not always generalize well across languages or datasets, nor does it extend to more complex structured tasks.