AstroRAG -- A Pagerank-Based Retrieval-Augmented Generation Pipeline for Question Answering in Astronomy

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the propensity of large language models to generate factual errors in astronomical question answering and the tendency of conventional retrieval-augmented generation (RAG) approaches to introduce irrelevant context, thereby degrading response quality. The authors propose a two-stage RAG pipeline: first retrieving diverse candidate documents using Maximal Marginal Relevance (MMR), then re-ranking them via a reader-driven PageRank algorithm applied on a similarity graph to select compact, mutually supportive context under strict token constraints. Integrating token-aware chunking and Elasticsearch transient indexing, the method requires no training, preserves privacy, and prevents cross-task information leakage. Evaluated on the AstroQA benchmark, Mistral-7B augmented with this RAG framework achieves 79.49% accuracy and F1 score—nearly doubling the performance of the base model.

📝 Abstract

Large language models (LLMs) demonstrate strong performance in natural language processing but often generate factual errors when relying solely on parametric knowledge. Retrieval-Augmented Generation (RAG) mitigates these errors by grounding responses in external evidence, yet conventional retrieve-and-dump approaches frequently introduce irrelevant context that degrades answer quality. In this work, we present AstroRAG -- a PageRank-based retrieval-augmented generation (RAG) pipeline adapted for question answering in astronomy. The system performs token-aware chunking and per-instance, ephemeral indexing in Elasticsearch, then executes a two-stage retrieval: (i) Maximal Marginal Relevance (MMR) to obtain a small, diverse candidate set and (ii) a reader-driven PageRank (PR) re-ranking on a similarity graph to identify a compact, mutually supportive context under a strict token budget. Our design is training-free, privacy-preserving, and reproducible, as each instance is processed through transient indexing to prevent cross-task leakage. We evaluate the pipeline on the AstroQA benchmark for astronomy QA, and demonstrate competitive performance across all difficulty levels. In particular, the RAG-enhanced Mistral-7B achieves \textbf{79.49\% accuracy} and \textbf{79.49\% F1-score}, nearly doubling the performance of its non-RAG counterpart. These results highlight the effectiveness of disciplined retrieval and refinement in boosting domain-specific reasoning, establishing a robust foundation for extending RAG to other scientific fields.

Problem

Research questions and friction points this paper is trying to address.

factual errors

retrieval-augmented generation

irrelevant context

astronomy question answering

domain-specific reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

PageRank-based retrieval

Retrieval-Augmented Generation (RAG)

token-aware chunking