Formal Reasoning for Intelligent QA Systems: A Case Study in the Educational Domain

📅 2025-09-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of causal reasoning and verifiability in large language models (LLMs) for high-stakes question answering in closed domains (e.g., education), this paper proposes MCFR—a novel framework that tightly integrates LLMs with formal model checking for the first time. MCFR employs a neuro-symbolic architecture to automatically translate natural-language questions into formal specifications expressed as state-transition systems, and then performs property verification over the constructed transition system. It supports verifiable multi-step dynamic reasoning, including conditional transitions and procedural progress. Evaluated on EduMC-QA, a custom educational benchmark, MCFR significantly outperforms baseline LLMs—including ChatGPT, DeepSeek, and Claude—in reasoning accuracy and logical consistency. Moreover, it ensures factual correctness, interpretability, and regulatory compliance through formally grounded inference.

Technology Category

Application Category

📝 Abstract
Reasoning is essential for closed-domain QA systems in which procedural correctness and policy compliance are critical. While large language models (LLMs) have shown strong performance on many reasoning tasks, recent work reveals that their reasoning traces are often unfaithful - serving more as plausible justifications than as causally grounded derivations. Efforts to combine LLMs with symbolic engines (e.g., Prover9, Z3) have improved reliability but remain limited to static forms of logic, struggling with dynamic, state-based reasoning such as multi-step progressions and conditional transitions. In this paper, we propose MCFR (Model Checking for Formal Reasoning), a neuro-symbolic framework that integrates LLMs with model checking to support property verification. MCFR translates natural language into formal specifications and verifies them over transition models. To support evaluation, we introduce EduMC-QA, a benchmark dataset grounded in real academic procedures. Our results show that MCFR improves reasoning faithfulness and interpretability, offering a viable path toward verifiable QA in high-stakes closed-domain applications. In addition to evaluating MCFR, we compare its performance with state-of-the-art LLMs such as ChatGPT, DeepSeek, and Claude to contextualize its effectiveness.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning faithfulness in closed-domain QA systems
Integrating LLMs with model checking for dynamic verification
Addressing limitations of static logic in state-based reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates LLMs with model checking
Translates natural language to formal specifications
Verifies properties over transition models
🔎 Similar Papers
No similar papers found.
T
Tuan Bui
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
An Nguyen
An Nguyen
Friedrich-Alexander University Erlangen-Nürnberg (FAU)
P
Phat Thai
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
M
Minh Hua
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
N
Ngan Pham L. N.
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
N
Ngan Pham T. B.
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
D
Dung Le
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
Long Nguyen
Long Nguyen
Graduate Student, Carnegie Mellon University
biological and biomedical sciencesdigital pathologycomputational microscopy
T
Thanh-Tung Tran
International University - VNU-HCM
T
Thang Bui
Ho Chi Minh City University of Technology (HCMUT) - VNU-HCM
Tho Quan
Tho Quan
Unknown affiliation