Bringing Emerging Architectures to Sequence Labeling in NLP

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

134K/year

🤖 AI Summary

Emerging non-Transformer architectures—including xLSTM, structured state space models (SSMs), diffusion models, and adversarial learning frameworks—face challenges in multilingual, hierarchical sequence labeling tasks characterized by complex label topologies (e.g., nested and discontinuous entities), strong token-level dependencies, and poor cross-lingual transferability, especially in low-resource languages. Method: We establish a unified multitask benchmark to systematically evaluate the adaptability and generalization of these architectures across diverse languages and labeling granularities, enabling the first large-scale, cross-lingual, cross-task comparative analysis. Contribution/Results: While several non-Transformer models match Transformer performance on simple flat-labeling tasks, they exhibit substantial degradation on structurally complex annotations and low-resource languages, revealing intrinsic architectural limitations in modeling long-range, hierarchical, and cross-lingual dependencies. Our empirical analysis identifies critical structural bottlenecks—particularly in dependency capture and compositional generalization—providing foundational evidence and concrete directions for next-generation sequence modeling.

Technology Category

Application Category

📝 Abstract

Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language modeling, few have been applied to sequence labeling, and mostly on flat or simplified tasks. We study how these architectures adapt across tagging tasks that vary in structural complexity, label space, and token dependencies, with evaluation spanning multiple languages. We find that the strong performance previously observed in simpler settings does not always generalize well across languages or datasets, nor does it extend to more complex structured tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating emerging architectures' adaptability to diverse sequence labeling tasks

Assessing generalization across languages and structural complexities

Testing performance beyond simplified settings to complex dependencies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluating emerging architectures for sequence labeling

Testing xLSTMs and structured state-space models

Assessing performance across complex tagging tasks

🔎 Similar Papers

Large Language Model Enhanced Knowledge Representation Learning: A Survey