🤖 AI Summary
This work addresses the challenge of unreliable and irreproducible reasoning trajectories in large language model (LLM) agents applied to drug discovery, which often stem from unconstrained tool usage and fragile long-horizon reasoning. To mitigate these issues, the authors propose Mozi, a dual-layer architecture that integrates the flexibility of generative AI with the rigor of computational biology through hierarchical supervision–workflow control and a stateful skill graph. Mozi incorporates role isolation, constrained action spaces, reflective replanning, and human-in-the-loop checkpoints to prevent error propagation and ensure scientific validity. Evaluated on the PharmaBench benchmark, Mozi substantially outperforms existing approaches and demonstrates end-to-end therapeutic case completion by efficiently identifying low-toxicity candidate molecules and generating competitive virtual compounds.
📝 Abstract
Tool-augmented large language model (LLM) agents promise to unify scientific reasoning with computation, yet their deployment in high-stakes domains like drug discovery is bottlenecked by two critical barriers: unconstrained tool-use governance and poor long-horizon reliability. In dependency-heavy pharmaceutical pipelines, autonomous agents often drift into irreproducible trajectories, where early-stage hallucinations multiplicatively compound into downstream failures. To overcome this, we present Mozi, a dual-layer architecture that bridges the flexibility of generative AI with the deterministic rigor of computational biology. Layer A (Control Plane) establishes a governed supervisor--worker hierarchy that enforces role-based tool isolation, limits execution to constrained action spaces, and drives reflection-based replanning. Layer B (Workflow Plane) operationalizes canonical drug discovery stages -- from Target Identification to Lead Optimization -- as stateful, composable skill graphs. This layer integrates strict data contracts and strategic human-in-the-loop (HITL) checkpoints to safeguard scientific validity at high-uncertainty decision boundaries.
Operating on the design principle of ``free-form reasoning for safe tasks, structured execution for long-horizon pipelines,'' Mozi provides built-in robustness mechanisms and trace-level audibility to completely mitigate error accumulation. We evaluate Mozi on PharmaBench, a curated benchmark for biomedical agents, demonstrating superior orchestration accuracy over existing baselines. Furthermore, through end-to-end therapeutic case studies, we demonstrate Mozi's ability to navigate massive chemical spaces, enforce stringent toxicity filters, and generate highly competitive in silico candidates, effectively transforming the LLM from a fragile conversationalist into a reliable, governed co-scientist.