🤖 AI Summary
This work addresses the lack of evaluation frameworks for large language models’ (LLMs) conceptual understanding in multilingual settings—particularly for low-resource and morphologically complex languages—by introducing XCOMPS, the first cross-lingual minimal-pair benchmark covering 17 languages. Methodologically, it proposes a cross-lingual conceptual minimal-pair paradigm, integrating metalinguistic prompting, direct probability measurement, neuro-linguistic probing, and multi-model comparison (base, instruction-tuned, and knowledge-distilled variants). Key contributions include: (1) the first empirical demonstration of significant cross-lingual imbalance in LLMs’ conceptual understanding; (2) identification of a decoupling effect—instruction tuning improves explicit task performance but not intrinsic conceptual competence, whereas knowledge distillation enhances the latter without substantial gains on explicit tasks; (3) confirmation that morphological complexity correlates positively with required network depth; and (4) identification of semantically similar distractors as a critical challenge. Results further show knowledge distillation improves intrinsic conceptual ability in low-resource languages, albeit with limited explicit-task benefits.
📝 Abstract
We introduce XCOMPS in this work, a multilingual conceptual minimal pair dataset covering 17 languages. Using this dataset, we evaluate LLMs' multilingual conceptual understanding through metalinguistic prompting, direct probability measurement, and neurolinguistic probing. By comparing base, instruction-tuned, and knowledge-distilled models, we find that: 1) LLMs exhibit weaker conceptual understanding for low-resource languages, and accuracy varies across languages despite being tested on the same concept sets. 2) LLMs excel at distinguishing concept-property pairs that are visibly different but exhibit a marked performance drop when negative pairs share subtle semantic similarities. 3) Instruction tuning improves performance in concept understanding but does not enhance internal competence; knowledge distillation can enhance internal competence in conceptual understanding for low-resource languages with limited gains in explicit task performance. 4) More morphologically complex languages yield lower concept understanding scores and require deeper layers for conceptual reasoning.