🤖 AI Summary
This study addresses the challenge of evaluating the reliability of multilingual large language models (LLMs) under knowledge conflict scenarios. Methodologically, we propose the first knowledge-conflict-aware RDF framework, systematically assessing German–English bilingual LLMs across four knowledge conditions: complete, missing, conflicting, and context-free. Our approach introduces a domain-adaptive RDF ontology integrating knowledge conflict injection, multilingual prompt engineering, and a structured evaluation lexicon; it further establishes a novel three-dimensional quality assessment paradigm—spanning knowledge leakage detection, error identification, and cross-lingual consistency. Empirical evaluation on 28 fire-safety tasks reveals pervasive context-priority bias and substantial language-specific performance disparities. The proposed ontology is empirically validated to comprehensively cover all observed reliability dimensions, providing a reusable methodological foundation for trustworthy multilingual knowledge assessment.
📝 Abstract
Large Language Models (LLMs) increasingly serve as knowledge interfaces, yet systematically assessing their reliability with conflicting information remains difficult. We propose an RDF-based framework to assess multilingual LLM quality, focusing on knowledge conflicts. Our approach captures model responses across four distinct context conditions (complete, incomplete, conflicting, and no-context information) in German and English. This structured representation enables the comprehensive analysis of knowledge leakage-where models favor training data over provided context-error detection, and multilingual consistency. We demonstrate the framework through a fire safety domain experiment, revealing critical patterns in context prioritization and language-specific performance, and demonstrating that our vocabulary was sufficient to express every assessment facet encountered in the 28-question study.