Beyond Protein Language Models: An Agentic LLM Framework for Mechanistic Enzyme Design

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accelerating the generation of mechanistic, testable scientific hypotheses in protein design—particularly for metalloproteins such as ferredoxins—remains a major challenge due to the complexity of sequence–structure–function relationships. Method: We introduce Genie-CAT, a tool-augmented, agent-based large language model (LLM) framework integrating retrieval-augmented generation (RAG), high-accuracy protein structure modeling, electrostatic potential computation, and machine learning–based prediction of redox properties to enable synergistic symbolic reasoning and physics-informed simulation. Contribution/Results: Unlike conventional conversational LLMs, Genie-CAT operates as an autonomous scientific agent that constructs interpretable, experimentally verifiable mechanistic hypotheses. In benchmarking on iron–sulfur proteins, it recapitulated expert-level inference within hours, autonomously identifying key residues modulating [Fe–S] cluster redox behavior. Hypothesis generation throughput improved by over an order of magnitude compared to manual approaches.

Technology Category

Application Category

📝 Abstract
We present Genie-CAT, a tool-augmented large-language-model (LLM) system designed to accelerate scientific hypothesis generation in protein design. Using metalloproteins (e.g., ferredoxins) as a case study, Genie-CAT integrates four capabilities -- literature-grounded reasoning through retrieval-augmented generation (RAG), structural parsing of Protein Data Bank files, electrostatic potential calculations, and machine-learning prediction of redox properties -- into a unified agentic workflow. By coupling natural-language reasoning with data-driven and physics-based computation, the system generates mechanistically interpretable, testable hypotheses linking sequence, structure, and function. In proof-of-concept demonstrations, Genie-CAT autonomously identifies residue-level modifications near [Fe--S] clusters that affect redox tuning, reproducing expert-derived hypotheses in a fraction of the time. The framework highlights how AI agents combining language models with domain-specific tools can bridge symbolic reasoning and numerical simulation, transforming LLMs from conversational assistants into partners for computational discovery.
Problem

Research questions and friction points this paper is trying to address.

Accelerating scientific hypothesis generation for protein design
Generating mechanistically interpretable links between sequence, structure, and function
Autonomously identifying residue-level modifications affecting enzyme redox properties
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates RAG with structural parsing and calculations
Combines language reasoning with physics-based computation
Autonomously identifies residue modifications for redox tuning
🔎 Similar Papers
No similar papers found.
Bruno Jacob
Bruno Jacob
Pacific Northwest National Laboratory
Computational PhysicsScientific Machine Learning
K
Khushbu Agarwal
Pacific Northwest National Laboratory
M
Marcel Baer
Pacific Northwest National Laboratory
P
Peter Rice
Pacific Northwest National Laboratory
Simone Raugei
Simone Raugei
Pacific Northwest National Laboratory