🤖 AI Summary
This work addresses the lack of standardized training sets, unified interfaces, and reliable evaluation protocols for multi-agent workflows in open science. The authors propose a retrieval-based synthesis framework that enables reusable skills and tools to be encapsulated within typed artifacts, facilitating interoperability among independently developed scientific agents and tool libraries. Crucially, the approach supports bounded, self-guided local repair upon execution failure, eliminating the need for workflow reconstruction or global topological search. This method enables the construction of auditable, open-world collaborative pipelines. Evaluated on two genomics case studies, the framework achieves state-of-the-art performance in four out of six benchmark tasks, demonstrates the best average performance overall, and incurs significantly lower task execution costs compared to existing baselines.
📝 Abstract
Designing multi-agent workflows is especially difficult in open-ended scientific settings where tasks lack curated training sets, reliable scalar evaluation metrics, and standardized interfaces between existing tools and agents. We propose AgentCo-op, a retrieval-based synthesis framework that composes reusable skills, tools, and external agents into executable workflows through typed artifact handoffs, then applies bounded self-guided local repair to implicated components when execution evidence indicates failure. In two open-world genomics case studies, AgentCo-op composes independently developed scientific agents and external tool repositories into auditable workflows without redesigning them or running global topology search. It coordinates specialized agents for spatial transcriptomics and gene-set interpretation to enable collaborative discovery from spatial transcriptomics data, and builds a parallel workflow for cross-modality marker analysis on single-cell multiome data. AgentCo-op can also import a searched workflow as a structural prior and improve it by grounding nodes with retrieved components and applying local repair, showing that synthesis and search are complementary. On six coding, math, and question-answering benchmarks, AgentCo-op achieves the best result on four benchmarks and the best average score under a unified backbone setting, while consistently reducing per-task cost relative to multi-agent baselines. Together, these results suggest that retrieval-based synthesis can extend automated agentic workflow design beyond benchmark-optimized agent graphs to open-world workflows built from existing agents, tools, and typed artifacts.