π€ AI Summary
Current machine learning approaches for protein design struggle to accommodate non-canonical amino acids and multi-objective optimization, limiting their generalizability. This work proposes Agent Rosetta, a novel framework that, for the first time, establishes a structured interactive environment enabling large language model (LLM) agents to effectively invoke the Rosetta physics-based modeling suite. By integrating the reasoning capabilities of LLMs with Rosettaβs biophysical precision, the system supports an iterative, goal-driven autonomous design process. Agent Rosetta not only achieves expert-level performance in canonical amino acid design but also substantially outperforms existing methods in tasks involving non-standard residues, demonstrating its broad applicability and effectiveness.
π Abstract
Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent paired with a structured environment for operating Rosetta, the leading physics-based heteropolymer design software, capable of modeling non-canonical building blocks and geometries. Agent Rosetta iteratively refines designs to achieve user-defined objectives, combining LLM reasoning with Rosetta's generality. We evaluate Agent Rosetta on design with canonical amino acids, matching specialized models and expert baselines, and with non-canonical residues -- where ML approaches fail -- achieving comparable performance. Critically, prompt engineering alone often fails to generate Rosetta actions, demonstrating that environment design is essential for integrating LLM agents with specialized software. Our results show that properly designed environments enable LLM agents to make scientific software accessible while matching specialized tools and human experts.