🤖 AI Summary
This paper presents the first systematic study on multilingual definition modeling, focusing on Spanish, French, Portuguese, and German, and introduces the first cross-lingual paraphrase generation benchmark. Methodologically, it fine-tunes multilingual pretrained models (e.g., mBERT, XLM-R) and conducts zero-shot evaluation of large language models (LLMs) including ChatGPT and Llama series; evaluation employs dual validation via BERTScore and human assessment. Key contributions are: (1) revealing that current multilingual models fail to leverage cross-lingual synergies—achieving English-level performance per language but no cross-lingual gain; (2) demonstrating superior zero-/few-shot paraphrasing capability of LLMs, with higher naturalness and stability; and (3) identifying strong correlation between BERTScore and mainstream multilingual LLM benchmarks, supporting its use as a lightweight, interpretable alternative for multilingual evaluation.
📝 Abstract
In this paper, we propose the first multilingual study on definition modeling. We use monolingual dictionary data for four new languages (Spanish, French, Portuguese, and German) and perform an in-depth empirical study to test the performance of pre-trained multilingual language models on definition modeling of monosemic words when finetuned on this data. Furthermore, we use a zero-shot approach to test the multilingual capabilities of two popular chat-based Large Language Models (LLMs) in the task. Results show that multilingual language models can perform on-pair with English but cannot leverage potential cross-lingual synergies, with LLMs generally offering better performance overall. A comprehensive human evaluation of the LLM-generated definition highlights the zero and few-shot capabilities of these models in this new task, also showing their shortcomings. Finally, we show that performance on our task via BERTScore strongly correlates to the performance on multilingual LLM benchmarks, suggesting that our task offers a viable compute-constrained, stable and natural alternative to these.