🤖 AI Summary
To address the limited global situational awareness and system-level reasoning capabilities of large language models (LLMs) in safety-critical aviation maintenance scenarios, this paper proposes a knowledge graph–enhanced retrieval-augmented generation (KG-RAG) framework. The method integrates the OMIn operations and maintenance dataset to construct a domain-specific knowledge graph that explicitly models entity relationships, thereby enabling localized lightweight LLMs—Gemma-3, Phi-4, and Mistral-Nemo—to perform cross-fragment semantic association and causal reasoning. Compared to conventional chunk-based RAG, KG-RAG significantly enhances system-level cognitive integration, achieving a 23.6% improvement in accuracy on multi-hop reasoning and root-cause fault analysis tasks. Reliability is rigorously evaluated using GPT-4o and Llama-3.3 as judge models, confirming the framework’s superior interpretability and robustness for high-stakes operational decision-making.
📝 Abstract
We present Knowledge Extraction on OMIn (KEO), a domain-specific knowledge extraction and reasoning framework with large language models (LLMs) in safety-critical contexts. Using the Operations and Maintenance Intelligence (OMIn) dataset, we construct a QA benchmark spanning global sensemaking and actionable maintenance tasks. KEO builds a structured Knowledge Graph (KG) and integrates it into a retrieval-augmented generation (RAG) pipeline, enabling more coherent, dataset-wide reasoning than traditional text-chunk RAG. We evaluate locally deployable LLMs (Gemma-3, Phi-4, Mistral-Nemo) and employ stronger models (GPT-4o, Llama-3.3) as judges. Experiments show that KEO markedly improves global sensemaking by revealing patterns and system-level insights, while text-chunk RAG remains effective for fine-grained procedural tasks requiring localized retrieval. These findings underscore the promise of KG-augmented LLMs for secure, domain-specific QA and their potential in high-stakes reasoning.