From Archives to Decisions: Multi-Agent Pharmaceutical Co-Scientist for Traceable Drug Discovery and Reverse Translation

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Low utilization of vast, heterogeneous historical pharmaceutical R&D data—particularly terminated project archives—hampers reverse translational research. To address this, we propose DiscoVerse, a multi-agent collaborative system designed for reverse translation, built upon a role-specialized agent architecture integrating large language models, semantic retrieval, cross-document association, and source-provenance tracing to enable auditable, traceable knowledge synthesis. DiscoVerse is the first framework empirically validated on Roche’s confidential, four-decade-long drug discovery corpus. Across seven benchmark queries covering 180 molecules, it achieves recall ≥0.99 and precision ranging from 0.71 to 0.91. Blind expert evaluation confirms its capability to accurately integrate preclinical and clinical evidence, faithfully reconstructing termination rationales and organ-specific toxicity assessments.

Technology Category

Application Category

📝 Abstract

Pharmaceutical research and development has accumulated vast, heterogeneous archives of data. Much of this knowledge stems from discontinued programs, and reusing these archives is invaluable for reverse translation. However, in practice, such reuse is often infeasible. In this work, we introduce DiscoVerse, a multi-agent co-scientist designed to support pharmaceutical research and development. The system implements semantic retrieval, cross-document linking, and auditable synthesis on a large historical corpus from Roche. To validate our approach at real-world scale, we selected a subset of 180 molecules from the Roche research repositories, covering over 0.87 billion BPE tokens and more than four decades of research. Given that automated evaluation metrics are poorly aligned with scientific utility, we evaluate the performance of DiscoVerse using blinded expert evaluation of source-linked outputs. To our knowledge, this is the first agentic framework systematically assessed on real pharmaceutical data for reverse translation, enabled by authorized access to confidential, end-to-end drug-development archives. Our contributions include role-specialized agent designs aligned with scientist workflows; human-in-the-loop support for reverse translation; expert evaluation; and a large-scale demonstration showing promising answer accuracy and decision-making insights. In brief, across seven benchmark queries covering 180 molecules, DiscoVerse achieved near-perfect recall ($geq 0.99$) with moderate precision ($0.71-0.91$), while qualitative assessments of discontinuation rationale and organ-specific toxicity showed faithful, source-linked synthesis across preclinical and clinical evidence.

Problem

Research questions and friction points this paper is trying to address.

Reusing discontinued pharmaceutical data for drug discovery

Enabling traceable reverse translation from historical archives

Addressing infeasibility of leveraging heterogeneous research data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for pharmaceutical data analysis

Semantic retrieval and cross-document linking technology

Human-in-the-loop reverse translation framework

🔎 Similar Papers

DrugAgent: Multi-Agent Large Language Model-Based Reasoning for Drug-Target Interaction Prediction