🤖 AI Summary
Low utilization of vast, heterogeneous historical pharmaceutical R&D data—particularly terminated project archives—hampers reverse translational research. To address this, we propose DiscoVerse, a multi-agent collaborative system designed for reverse translation, built upon a role-specialized agent architecture integrating large language models, semantic retrieval, cross-document association, and source-provenance tracing to enable auditable, traceable knowledge synthesis. DiscoVerse is the first framework empirically validated on Roche’s confidential, four-decade-long drug discovery corpus. Across seven benchmark queries covering 180 molecules, it achieves recall ≥0.99 and precision ranging from 0.71 to 0.91. Blind expert evaluation confirms its capability to accurately integrate preclinical and clinical evidence, faithfully reconstructing termination rationales and organ-specific toxicity assessments.
📝 Abstract
Pharmaceutical research and development has accumulated vast, heterogeneous archives of data. Much of this knowledge stems from discontinued programs, and reusing these archives is invaluable for reverse translation. However, in practice, such reuse is often infeasible. In this work, we introduce DiscoVerse, a multi-agent co-scientist designed to support pharmaceutical research and development. The system implements semantic retrieval, cross-document linking, and auditable synthesis on a large historical corpus from Roche. To validate our approach at real-world scale, we selected a subset of 180 molecules from the Roche research repositories, covering over 0.87 billion BPE tokens and more than four decades of research. Given that automated evaluation metrics are poorly aligned with scientific utility, we evaluate the performance of DiscoVerse using blinded expert evaluation of source-linked outputs. To our knowledge, this is the first agentic framework systematically assessed on real pharmaceutical data for reverse translation, enabled by authorized access to confidential, end-to-end drug-development archives. Our contributions include role-specialized agent designs aligned with scientist workflows; human-in-the-loop support for reverse translation; expert evaluation; and a large-scale demonstration showing promising answer accuracy and decision-making insights. In brief, across seven benchmark queries covering 180 molecules, DiscoVerse achieved near-perfect recall ($geq 0.99$) with moderate precision ($0.71-0.91$), while qualitative assessments of discontinuation rationale and organ-specific toxicity showed faithful, source-linked synthesis across preclinical and clinical evidence.