๐ค AI Summary
Large language models struggle to perform complex scientific tasks such as simulation and decision-making, and constructing intelligent agent workflows that integrate foundation models, cloud platforms, and external tools remains highly challenging. This work proposes the first fully cloud-deployed, domain- and model-agnostic multi-agent framework for scientific assistance, in which a supervisory agent dynamically orchestrates multiple specialized agents to enable end-to-end automationโfrom literature review and data analysis to simulation experiments. The framework supports automatic task routing, cross-tool collaboration, and cost-transparent deployment. Evaluated on synthetic and chemical benchmarks, it achieves 90% task-routing accuracy and task completion rates of 97.5% and 91%, respectively, matching state-of-the-art performance and validated by domain experts.
๐ Abstract
As Large Language Models (LLMs) become ubiquitous across various scientific domains, their lack of ability to perform complex tasks like running simulations or to make complex decisions limits their utility. LLM-based agents bridge this gap due to their ability to call external resources and tools and thus are now rapidly gaining popularity. However, coming up with a workflow that can balance the models, cloud providers, and external resources is very challenging, making implementing an agentic system more of a hindrance than a help. In this work, we present a domain-agnostic, model-independent workflow for an agentic framework that can act as a scientific assistant while being run entirely on cloud. Built with a supervisor agent marshaling an array of agents with individual capabilities, our framework brings together straightforward tasks like literature review and data analysis with more complex ones like simulation runs. We describe the framework here in full, including a proof-of-concept system we built to accelerate the study of Catalysts, which is highly important in the field of Chemistry and Material Science. We report the cost to operate and use this framework, including the breakdown of the cost by services use. We also evaluate our system on a custom-curated synthetic benchmark and a popular Chemistry benchmark, and also perform expert validation of the system. The results show that our system is able to route the task to the correct agent 90% of the time and successfully complete the assigned task 97.5% of the time for the synthetic tasks and 91% of the time for real-world tasks, while still achieving better or comparable accuracy to most frontier models, showing that this is a viable framework for other scientific domains to replicate.