RDF-Based Structured Quality Assessment Representation of Multilingual LLM Evaluations

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This study addresses the challenge of evaluating the reliability of multilingual large language models (LLMs) under knowledge conflict scenarios. Methodologically, we propose the first knowledge-conflict-aware RDF framework, systematically assessing German–English bilingual LLMs across four knowledge conditions: complete, missing, conflicting, and context-free. Our approach introduces a domain-adaptive RDF ontology integrating knowledge conflict injection, multilingual prompt engineering, and a structured evaluation lexicon; it further establishes a novel three-dimensional quality assessment paradigm—spanning knowledge leakage detection, error identification, and cross-lingual consistency. Empirical evaluation on 28 fire-safety tasks reveals pervasive context-priority bias and substantial language-specific performance disparities. The proposed ontology is empirically validated to comprehensively cover all observed reliability dimensions, providing a reusable methodological foundation for trustworthy multilingual knowledge assessment.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) increasingly serve as knowledge interfaces, yet systematically assessing their reliability with conflicting information remains difficult. We propose an RDF-based framework to assess multilingual LLM quality, focusing on knowledge conflicts. Our approach captures model responses across four distinct context conditions (complete, incomplete, conflicting, and no-context information) in German and English. This structured representation enables the comprehensive analysis of knowledge leakage-where models favor training data over provided context-error detection, and multilingual consistency. We demonstrate the framework through a fire safety domain experiment, revealing critical patterns in context prioritization and language-specific performance, and demonstrating that our vocabulary was sufficient to express every assessment facet encountered in the 28-question study.

Problem

Research questions and friction points this paper is trying to address.

Assessing reliability of multilingual LLMs with conflicting information

Structured quality evaluation across different context conditions

Analyzing knowledge leakage and multilingual consistency in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

RDF-based framework for multilingual LLM assessment

Captures responses under four context conditions

Analyzes knowledge leakage and multilingual consistency

🔎 Similar Papers

CheckEval: A reliable LLM-as-a-Judge framework for evaluating text generation using checklists