🤖 AI Summary
To address the interdependent yet often isolated treatment of three hallucination types—input, context, and factual conflicts—in large language models (LLMs), this paper proposes MALM, a Multi-information Adapter for LLMs. MALM is the first method to explicitly model the dependencies among these hallucination categories via plug-and-play multilayer graph attention adapters, jointly capturing deep interactions among raw inputs, conversational context, and external knowledge—natively integrating with retrieval-augmented generation (RAG) frameworks. Evaluated on four benchmarks including HaluEval, MALM significantly outperforms LLaMA-2; human and GPT-4 preference rates reach 79.4% and 65.6%, respectively. The approach supports seven mainstream LLMs and three retrieval architectures, demonstrating strong cross-model generalization and deployment flexibility.
📝 Abstract
Large language models (LLMs) are prone to three types of hallucination: Input-Conflicting, Context-Conflicting and Fact-Conflicting hallucinations. The purpose of this study is to mitigate the different types of hallucination by exploiting the interdependence between them. For this purpose, we propose a Multi-Information Adapter for Large Language Models (MALM). This framework employs a tailored multi-graph learning approach designed to elucidate the interconnections between original inputs, contextual information, and external factual knowledge, thereby alleviating the three categories of hallucination within a cohesive framework. Experiments were carried out on four benchmarking datasets: HaluEval, TruthfulQA, Natural Questions, and TriviaQA. We evaluated the proposed framework in two aspects: (1) adaptability to different base LLMs on HaluEval and TruthfulQA, to confirm if MALM is effective when applied on 7 typical LLMs. MALM showed significant improvements over LLaMA-2; (2) generalizability to retrieval-augmented generation (RAG) by combining MALM with three representative retrievers (BM25, Spider and DPR) separately. Furthermore, automated and human evaluations were conducted to substantiate the correctness of experimental results, where GPT-4 and 3 human volunteers judged which response was better between LLaMA-2 and MALM. The results showed that both GPT-4 and human preferred MALM in 79.4% and 65.6% of cases respectively. The results validate that incorporating the complex interactions between the three types of hallucination through a multilayered graph attention network into the LLM generation process is effective to mitigate the them. The adapter design of the proposed approach is also proven flexible and robust across different base LLMs.