π€ AI Summary
Existing reaction condition recommendation methods lack interpretability, hindering their adoption in high-stakes scientific decision-making. To address this, we propose ChemMASβthe first framework to integrate multi-agent debate mechanisms into chemical reaction reasoning. ChemMAS transforms condition recommendation into an evidence-based, verifiable inference task through mechanism anchoring, multi-channel backtracking, constraint-aware debate, and rationale aggregation. It synergistically combines large language models, mechanistic modeling, evidence retrieval, and constraint optimization to generate human-readable, traceable, and empirically verifiable reasoning chains. On standard benchmarks, ChemMAS achieves a 20β35% improvement over domain-specific baselines and surpasses general-purpose large language models by 10β15% in Top-1 accuracy. This work establishes a new paradigm for interpretable AI in scientific discovery, enabling rigorous, transparent, and trustworthy chemical reasoning.
π Abstract
The chemical reaction recommendation is to select proper reaction condition parameters for chemical reactions, which is pivotal to accelerating chemical science. With the rapid development of large language models (LLMs), there is growing interest in leveraging their reasoning and planning capabilities for reaction condition recommendation. Despite their success, existing methods rarely explain the rationale behind the recommended reaction conditions, limiting their utility in high-stakes scientific workflows. In this work, we propose ChemMAS, a multi-agent system that reframes condition prediction as an evidence-based reasoning task. ChemMAS decomposes the task into mechanistic grounding, multi-channel recall, constraint-aware agentic debate, and rationale aggregation. Each decision is backed by interpretable justifications grounded in chemical knowledge and retrieved precedents. Experiments show that ChemMAS achieves 20-35% gains over domain-specific baselines and outperforms general-purpose LLMs by 10-15% in Top-1 accuracy, while offering falsifiable, human-trustable rationales, which establishes a new paradigm for explainable AI in scientific discovery.