Automating Knowledge Discovery from Scientific Literature via LLMs: A Dual-Agent Approach with Progressive Ontology Prompting

📅 2024-08-20

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Scientific literature is vast, highly specialized, and heterogeneously reported, rendering manual extraction of interventions inefficient and error-prone. To address this, we propose LLM-Duo, a novel dual-agent system that introduces ontology-guided priority breadth-first search (BFS) for prompt generation and an explorer-evaluator adversarial collaboration paradigm, augmented by the Progressive Ontology Prompting (POP) algorithm. Integrating large language models (LLMs), ontology engineering, and multi-agent collaborative reasoning, our approach enables automated, structured knowledge discovery. Applied to speech-language therapy, it precisely identifies 2,421 interventions from 64,000 publications, constructing and open-sourcing the first domain-specific intervention knowledge base. This significantly enhances the reliability, completeness, and reproducibility of intervention knowledge discovery.

Technology Category

Application Category

📝 Abstract

To address the challenge of automating knowledge discovery from a vast volume of literature, in this paper, we introduce a novel framework based on large language models (LLMs) that combines a progressive ontology prompting (POP) algorithm with a dual-agent system, named LLM-Duo, designed to enhance the automation of knowledge extraction from scientific articles. The POP algorithm utilizes a prioritized breadth-first search (BFS) across a predefined ontology to generate structured prompt templates and action orders, thereby guiding LLMs to discover knowledge in an automatic manner. Additionally, our LLM-Duo employs two specialized LLM agents: an explorer and an evaluator. These two agents work collaboratively and adversarially to enhance the reliability of the discovery and annotation processes. Experiments demonstrate that our method outperforms advanced baselines, enabling more accurate and complete annotations. To validate the effectiveness of our method in real-world scenarios, we employ our method in a case study of speech-language intervention discovery. Our method identifies 2,421 interventions from 64,177 research articles in the speech-language therapy domain. We curate these findings into a publicly accessible intervention knowledge base that holds significant potential to benefit the speech-language therapy community.

Problem

Research questions and friction points this paper is trying to address.

Automating discovery of interventions from vast scientific literature

Overcoming specialized terminology and inconsistent reporting formats

Enhancing annotation accuracy via dual-LLM collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Progressive ontology prompting algorithm for structured annotation

Dual-LLM system with explorer and evaluator agents

Automated intervention discovery from large scientific literature

🔎 Similar Papers

No similar papers found.