From Archives to Decisions: Multi-Agent Pharmaceutical Co-Scientist for Traceable Drug Discovery and Reverse Translation

📅 2025-11-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
Low utilization of vast, heterogeneous historical pharmaceutical R&D data—particularly terminated project archives—hampers reverse translational research. To address this, we propose DiscoVerse, a multi-agent collaborative system designed for reverse translation, built upon a role-specialized agent architecture integrating large language models, semantic retrieval, cross-document association, and source-provenance tracing to enable auditable, traceable knowledge synthesis. DiscoVerse is the first framework empirically validated on Roche’s confidential, four-decade-long drug discovery corpus. Across seven benchmark queries covering 180 molecules, it achieves recall ≥0.99 and precision ranging from 0.71 to 0.91. Blind expert evaluation confirms its capability to accurately integrate preclinical and clinical evidence, faithfully reconstructing termination rationales and organ-specific toxicity assessments.

Technology Category

Application Category

📝 Abstract
Pharmaceutical research and development has accumulated vast, heterogeneous archives of data. Much of this knowledge stems from discontinued programs, and reusing these archives is invaluable for reverse translation. However, in practice, such reuse is often infeasible. In this work, we introduce DiscoVerse, a multi-agent co-scientist designed to support pharmaceutical research and development. The system implements semantic retrieval, cross-document linking, and auditable synthesis on a large historical corpus from Roche. To validate our approach at real-world scale, we selected a subset of 180 molecules from the Roche research repositories, covering over 0.87 billion BPE tokens and more than four decades of research. Given that automated evaluation metrics are poorly aligned with scientific utility, we evaluate the performance of DiscoVerse using blinded expert evaluation of source-linked outputs. To our knowledge, this is the first agentic framework systematically assessed on real pharmaceutical data for reverse translation, enabled by authorized access to confidential, end-to-end drug-development archives. Our contributions include role-specialized agent designs aligned with scientist workflows; human-in-the-loop support for reverse translation; expert evaluation; and a large-scale demonstration showing promising answer accuracy and decision-making insights. In brief, across seven benchmark queries covering 180 molecules, DiscoVerse achieved near-perfect recall ($geq 0.99$) with moderate precision ($0.71-0.91$), while qualitative assessments of discontinuation rationale and organ-specific toxicity showed faithful, source-linked synthesis across preclinical and clinical evidence.
Problem

Research questions and friction points this paper is trying to address.

Reusing discontinued pharmaceutical data for drug discovery
Enabling traceable reverse translation from historical archives
Addressing infeasibility of leveraging heterogeneous research data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent system for pharmaceutical data analysis
Semantic retrieval and cross-document linking technology
Human-in-the-loop reverse translation framework
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Xiaochen Zheng
Xiaochen Zheng
Assistant Professor, Southern University of Science and Technology
Internet of ThingsCognitive Digital TwinsSemantic ModellingMBSE
A
Alvaro Serra
Predictive Modelling, F. Hoffmann-La Roche Ltd., Basel, Switzerland
I
Ilya Schneider Chernov
Predictive Modelling, F. Hoffmann-La Roche Ltd., Basel, Switzerland
M
Maddalena Marchesi
Clinical Safety, F. Hoffmann-La Roche Ltd., Basel, Switzerland
E
Eunice Musvasva
Translational Safety, F. Hoffmann-La Roche Ltd., Basel, Switzerland
T
Tatyana Y. Doktorova
Predictive Modelling, F. Hoffmann-La Roche Ltd., Basel, Switzerland