SimulRAG: Simulator-based RAG for Grounding LLMs in Long-form Scientific QA

πŸ“… 2025-09-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Large language models (LLMs) frequently generate hallucinated content when producing long-form scientific question-answering responses, especially in complex scenarios requiring integration of multiple concepts and empirical evidence. Existing retrieval-augmented generation (RAG) methods struggle to incorporate scientific simulators as knowledge sources due to the absence of standardized simulator retrieval interfaces and efficient mechanisms for verifying lengthy, multi-step outputs. To address this, we propose SimulRAGβ€”a novel RAG framework that natively integrates scientific simulators into the generation pipeline. It introduces a universal simulator retrieval interface and a claim-level iterative generation and refinement mechanism grounded in uncertainty estimation (UE) and simulator boundary assessment (SBA), enabling joint textual and numerical reasoning. Evaluated on climate science and epidemiology benchmarks, SimulRAG achieves a 30.4% increase in information density and a 16.3% improvement in factual accuracy over conventional RAG, significantly enhancing both the credibility and efficiency of long-answer generation.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) show promise in solving scientific problems. They can help generate long-form answers for scientific questions, which are crucial for comprehensive understanding of complex phenomena that require detailed explanations spanning multiple interconnected concepts and evidence. However, LLMs often suffer from hallucination, especially in the challenging task of long-form scientific question answering. Retrieval-Augmented Generation (RAG) approaches can ground LLMs by incorporating external knowledge sources to improve trustworthiness. In this context, scientific simulators, which play a vital role in validating hypotheses, offer a particularly promising retrieval source to mitigate hallucination and enhance answer factuality. However, existing RAG approaches cannot be directly applied for scientific simulation-based retrieval due to two fundamental challenges: how to retrieve from scientific simulators, and how to efficiently verify and update long-form answers. To overcome these challenges, we propose the simulator-based RAG framework (SimulRAG) and provide a long-form scientific QA benchmark covering climate science and epidemiology with ground truth verified by both simulations and human annotators. In this framework, we propose a generalized simulator retrieval interface to transform between textual and numerical modalities. We further design a claim-level generation method that utilizes uncertainty estimation scores and simulator boundary assessment (UE+SBA) to efficiently verify and update claims. Extensive experiments demonstrate SimulRAG outperforms traditional RAG baselines by 30.4% in informativeness and 16.3% in factuality. UE+SBA further improves efficiency and quality for claim-level generation.
Problem

Research questions and friction points this paper is trying to address.

Grounding LLMs in long-form scientific QA using simulator-based retrieval
Mitigating hallucination by integrating scientific simulators into RAG framework
Developing efficient claim verification through uncertainty estimation and boundary assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulator-based RAG framework for scientific QA
Generalized interface converts text to numerical data
Uncertainty estimation with boundary assessment verifies claims
πŸ”Ž Similar Papers
No similar papers found.