🤖 AI Summary
This work addresses the EvaCun 2025 low-resource multilingual token prediction task. We propose a domain-agnostic fine-tuning paradigm that requires no domain-specific preprocessing and relies solely on the organizer’s original training data. Leveraging multilingual large language models—including Command-R, Mistral, and Aya Expanse—we perform end-to-end instruction tuning and systematically compare three prompt strategies under zero domain prior assumptions. Experiments on the held-out validation set demonstrate substantial improvements over baseline methods. Our approach delivers strong robustness and high reproducibility for resource-constrained cross-lingual token prediction. Crucially, it provides the first empirical validation that general-purpose LLM fine-tuning—without domain adaptation—can effectively generalize to fine-grained multilingual prediction tasks, confirming its cross-lingual transfer potential and viability in low-resource settings.
📝 Abstract
In this paper, we present our submission for the token prediction task of EvaCun 2025. Our sys-tems are based on LLMs (Command-R, Mistral, and Aya Expanse) fine-tuned on the task data provided by the organizers. As we only pos-sess a very superficial knowledge of the subject field and the languages of the task, we simply used the training data without any task-specific adjustments, preprocessing, or filtering. We compare 3 different approaches (based on 3 different prompts) of obtaining the predictions, and we evaluate them on a held-out part of the data.