π€ AI Summary
This study addresses the challenge of retrieving fine-grained information from scientific literature to support analytical and decision-making tasks, where existing methods often fail to balance content faithfulness with contextual relevance when answering research-oriented queries. To this end, the paper introduces IntraView, a novel task formulation, and proposes IntrAgentβan intelligent agent that emulates human reading behavior through a two-stage framework for context-anchored information retrieval. The first stage leverages structural knowledge reasoning to prioritize relevant sections, followed by iterative reading to extract details and generate concise answers. The authors also construct IntraBench, a cross-disciplinary expert-annotated benchmark for evaluation. Experiments demonstrate that IntrAgent achieves an average 13.2% improvement in cross-domain accuracy over state-of-the-art RAG and research-agent baselines across seven prominent large language model backbones.
π Abstract
Scientific research relies on accurate information retrieval from literature to support analytical decisions. In this work, we introduce a new task, INformation reTRieval through literAture reVIEW (IntraView), which aims to automate fine-grained information retrieval faithfully grounded in the provided content in response to research-driven queries, and propose IntrAgent, an LLM-based agent that addresses this challenging task. In particular, IntrAgent is designed to mimic human behaviors when reading literature for information retrieval -- identifying relevant sections and then iteratively extracting key details to refine the retrieved information. It follows a two-stage pipeline: a Section Ranking stage that prioritizes relevant literature sections through structural-knowledge-enabled reasoning, and an Iterative Reading stage that continuously extracts details and synthesizes them into concise, contextually grounded answers. To support rigorous evaluation, we introduce IntraBench, a new benchmark consisting of 315 test instances built from expert-authored questions paired with literature spanning five STEM domains. Across seven backbone LLMs, IntrAgent achieves on average 13.2% higher cross-domain accuracy than state-of-the-art RAG and research-agent baselines.