BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Biomedical hypothesis generation is often constrained by reliance on single data sources and rigid extraction paradigms, limiting the discovery of complex, novel relationships; meanwhile, LLM-based agents remain underutilized in this domain due to the absence of standardized benchmarks and execution environments. To address these challenges, we introduce the first standardized benchmark and reproducible agent framework specifically designed for biomedical hypothesis generation. Our method employs a ReAct-inspired generate–evaluate dual-module architecture that enables autonomous hypothesis iteration and real-time validation. The framework integrates unstructured PubMed literature with structured knowledge bases, supporting multi-step retrieval, collaborative reasoning, and self-assessment. Experimental results demonstrate significant improvements in hypothesis novelty (+28.6%) and scientific relevance (+31.4%), confirming the complementary value of textual and structured information, and uncovering systematic effects of agent architecture on exploratory diversity.

Technology Category

Application Category

📝 Abstract
Hypothesis generation in biomedical research has traditionally centered on uncovering hidden relationships within vast scientific literature, often using methods like Literature-Based Discovery (LBD). Despite progress, current approaches typically depend on single data types or predefined extraction patterns, which restricts the discovery of novel and complex connections. Recent advances in Large Language Model (LLM) agents show significant potential, with capabilities in information retrieval, reasoning, and generation. However, their application to biomedical hypothesis generation has been limited by the absence of standardized datasets and execution environments. To address this, we introduce BioVerge, a comprehensive benchmark, and BioVerge Agent, an LLM-based agent framework, to create a standardized environment for exploring biomedical hypothesis generation at the frontier of existing scientific knowledge. Our dataset includes structured and textual data derived from historical biomedical hypotheses and PubMed literature, organized to support exploration by LLM agents. BioVerge Agent utilizes a ReAct-based approach with distinct Generation and Evaluation modules that iteratively produce and self-assess hypothesis proposals. Through extensive experimentation, we uncover key insights: 1) different architectures of BioVerge Agent influence exploration diversity and reasoning strategies; 2) structured and textual information sources each provide unique, critical contexts that enhance hypothesis generation; and 3) self-evaluation significantly improves the novelty and relevance of proposed hypotheses.
Problem

Research questions and friction points this paper is trying to address.

Current biomedical hypothesis methods rely on limited data types and extraction patterns
LLM agents lack standardized datasets and execution environments for biomedical discovery
Existing approaches restrict discovery of novel and complex biomedical connections
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agent framework for biomedical hypothesis generation
ReAct-based approach with Generation and Evaluation modules
Self-evaluation improves hypothesis novelty and relevance
🔎 Similar Papers
No similar papers found.
F
Fuyi Yang
University of California, Los Angeles
Chenchen Ye
Chenchen Ye
University of California, Los Angeles
Mingyu Derek Ma
Mingyu Derek Ma
Prescient Design, Genentech/Roche
Generative Language ModelsLLM AgentsNatural Language ProcessingMachine LearningAI4Science
Y
Yijiao Xiao
University of California, Los Angeles
M
Matthew Yang
University of California, Los Angeles
W
Wei Wang
University of California, Los Angeles