Democratizing AI scientists using ToolUniverse

📅 2025-09-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
Current AI scientist systems suffer from limited personalization, rigid workflows, and fragmented integration of tools and data, hindering open, collaborative scientific discovery. To address this, we propose ToolUniverse—the first ecosystem for AI scientists supporting arbitrary language models and reasoning frameworks. It unifies over 600 machine learning models, datasets, APIs, and scientific tools. We introduce a standardized tool invocation framework enabling automatic interface refinement, natural-language-driven tool generation, composable workflow orchestration, and iterative optimization. Technically, ToolUniverse integrates multimodal NLP, automated tool interface modeling, and heterogeneous model coordination, ensuring compatibility with both open- and closed-source large language models. Evaluated in familial hypercholesterolemia research, it successfully instantiated an AI scientist capable of identifying high-potential drug analogues. The platform is fully open-sourced, fostering community-driven development and scalable deployment of scientific AI agents.

Technology Category

Application Category

📝 Abstract
AI scientists are emerging computational systems that serve as collaborative partners in discovery. These systems remain difficult to build because they are bespoke, tied to rigid workflows, and lack shared environments that unify tools, data, and analyses into a common ecosystem. In omics, unified ecosystems have transformed research by enabling interoperability, reuse, and community-driven development; AI scientists require comparable infrastructure. We present ToolUniverse, an ecosystem for building AI scientists from any language or reasoning model, whether open or closed. TOOLUNIVERSE standardizes how AI scientists identify and call tools, integrating more than 600 machine learning models, datasets, APIs, and scientific packages for data analysis, knowledge retrieval, and experimental design. It automatically refines tool interfaces for correct use by AI scientists, creates new tools from natural language descriptions, iteratively optimizes tool specifications, and composes tools into agentic workflows. In a case study of hypercholesterolemia, ToolUniverse was used to create an AI scientist to identify a potent analog of a drug with favorable predicted properties. The open-source ToolUniverse is available at https://aiscientist.tools.
Problem

Research questions and friction points this paper is trying to address.

Building customizable AI scientists with flexible workflows
Creating shared environments unifying tools, data, and analyses
Standardizing tool identification and integration for AI scientists
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardizes tool identification and calling for AI scientists
Automatically refines tool interfaces and creates new tools
Composes tools into agentic workflows for scientific discovery
🔎 Similar Papers
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
Shanghua Gao
Shanghua Gao
Harvard University
Agentic AIAI for ScienceRepresentation learningGenerative modelingComputer Vision
R
Richard Zhu
Department of Biomedical Informatics, Harvard Medical School, Boston, MA; Harvard College, Harvard University, Cambridge, MA
P
Pengwei Sui
Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Zhenglun Kong
Zhenglun Kong
Harvard University
Efficient Deep LearningLarge Language ModelAI4Science
S
Sufian Aldogom
Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Y
Yepeng Huang
Department of Biomedical Informatics, Harvard Medical School, Boston, MA
Ayush Noori
Ayush Noori
A.B./S.M., Harvard University; Rhodes Scholar
Artificial IntelligenceNeurodegenerationPrecision MedicineKnowledge GraphsMultimodal AI
R
Reza Shamji
Department of Biomedical Informatics, Harvard Medical School, Boston, MA; Harvard College, Harvard University, Cambridge, MA
K
Krishna Parvataneni
Massachusetts Institute of Technology, Cambridge, MA
Theodoros Tsiligkaridis
Theodoros Tsiligkaridis
Senior Research Scientist - MIT Lincoln Laboratory
Deep LearningMachine LearningArtificial Intelligence
Marinka Zitnik
Marinka Zitnik
Associate Professor, Harvard University
Machine LearningGeometric Deep LearningKnowledge GraphsBiomedical AITherapeutics