🤖 AI Summary
Large language models (LLMs) frequently exhibit factual hallucinations even on simple question-answering tasks. Method: This paper introduces LP-LM, a logic programming–driven, zero-hallucination QA system. It employs definite clause grammars (DCGs) to semantically parse natural language questions into Prolog terms and performs exact inference over a structured knowledge base, leveraging tabling to ensure both completeness and efficiency. Contribution/Results: LP-LM pioneers deep integration of DCG-based semantic parsing with Prolog knowledge-base execution—enabling fully verifiable, traceable, and hallucination-free answers. Experiments demonstrate 100% accuracy across multiple simple QA benchmarks, substantially outperforming state-of-the-art LLMs. Moreover, inference time scales linearly with input length, ensuring both reliability and scalability.
📝 Abstract
Large language models (LLMs) are able to generate human-like responses to user queries. However, LLMs exhibit inherent limitations, especially because they hallucinate. This paper introduces LP-LM, a system that grounds answers to questions in known facts contained in a knowledge base (KB), facilitated through semantic parsing in Prolog, and always produces answers that are reliable. LP-LM generates a most probable constituency parse tree along with a corresponding Prolog term for an input question via Prolog definite clause grammar (DCG) parsing. The term is then executed against a KB of natural language sentences also represented as Prolog terms for question answering. By leveraging DCG and tabling, LP-LM runs in linear time in the size of input sentences for sufficiently many grammar rules. Performing experiments comparing LP-LM with current well-known LLMs in accuracy, we show that LLMs hallucinate on even simple questions, unlike LP-LM.