🤖 AI Summary
This paper addresses persistent challenges in multiword expression (MWE) research—including annotation inconsistency, limited modeling generalizability, and poor cross-lingual and cross-task transfer—identified through a systematic two-decade review. Methodologically, it integrates bibliometric analysis, longitudinal examination of MWE Workshop themes, cross-framework comparison between MWE resources and the Universal Dependencies (UD) formalism, and stakeholder needs mapping across academia, industry, and education. The study constructs the first comprehensive MWE research timeline and thematic map. Its key contributions include: (i) proposing the “MWE–UD co-evolution” framework to unify theoretical bottlenecks and catalyze paradigm shifts; (ii) developing an extensible annotation protocol, a dynamic MWE recognition interface, and an education–industry adoption roadmap; and (iii) providing foundational theory and actionable pathways for deep integration of MWEs into computational linguistics and NLP pipelines.
📝 Abstract
Starting in 2003 when the first MWE workshop was held with ACL in Sapporo, Japan, this year, the joint workshop of MWE-UD co-located with the LREC-COLING 2024 conference marked the 20th anniversary of MWE workshop events over the past nearly two decades. Standing at this milestone, we look back to this workshop series and summarise the research topics and methodologies researchers have carried out over the years. We also discuss the current challenges that we are facing and the broader impacts/synergies of MWE research within the CL and NLP fields. Finally, we give future research perspectives. We hope this position paper can help researchers, students, and industrial practitioners interested in MWE get a brief but easy understanding of its history, current, and possible future.