🤖 AI Summary
This work addresses the end-to-end generation of natural language text from multilingual Uniform Meaning Representation (UMR) graphs—a challenging task due to severe scarcity of annotated UMR data. To bridge this gap, we establish the first comprehensive UMR-to-text generation framework, proposing three complementary approaches: (1) structural conversion from UMR to Abstract Meaning Representation (AMR) to leverage existing AMR-to-text models; (2) few-shot fine-tuning of multilingual large language models (LLMs); and (3) lightweight adaptation of AMR graph encoders via architecture modification. We further introduce multilingual BERTScore for cross-lingual consistency evaluation. Experimental results show that our best-performing model achieves BERTScore scores of 0.825 (English) and 0.882 (Chinese), significantly outperforming all baselines. This is the first systematic empirical validation demonstrating UMR’s feasibility and effectiveness as a cross-lingual semantic intermediate representation for controllable text generation.
📝 Abstract
Uniform Meaning Representation (UMR) is a recently developed graph-based semantic representation, which expands on Abstract Meaning Representation (AMR) in a number of ways, in particular through the inclusion of document-level information and multilingual flexibility. In order to effectively adopt and leverage UMR for downstream tasks, efforts must be placed toward developing a UMR technological ecosystem. Though still limited amounts of UMR annotations have been produced to date, in this work, we investigate the first approaches to producing text from multilingual UMR graphs: (1) a pipeline conversion of UMR to AMR, then using AMR-to-text generation models, (2) fine-tuning large language models with UMR data, and (3) fine-tuning existing AMR-to-text generation models with UMR data. Our best performing model achieves a multilingual BERTscore of 0.825 for English and 0.882 for Chinese when compared to the reference, which is a promising indication of the effectiveness of fine-tuning approaches for UMR-to-text generation with even limited amounts of UMR data.