De novo generation of functional terpene synthases using TpsGPT

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
De novo design of terpene synthases (TPSs) remains inefficient and costly due to reliance on labor-intensive, low-throughput approaches. Method: We introduce a generative AI–driven design paradigm: first fine-tuning the protein language model ProtGPT2 on a TPS-specific dataset to yield TpsGPT, then integrating a multi-tiered structure–function filtering pipeline—including ESMFold-based 3D structure prediction, Foldseek-based structural similarity screening, InterPro domain validation, and EnzymeExplorer–guided catalytic site assessment. Contribution/Results: From 28,000 generated sequences, seven high-confidence candidates were identified; experimental characterization confirmed robust TPS activity for two, both phylogenetically distant from known TPS families. This approach overcomes the time-intensive limitations of traditional directed evolution, enabling scalable, cost-effective, and high-success-rate de novo design of functional TPSs—establishing a new paradigm for terpenoid biosynthesis.

Technology Category

Application Category

📝 Abstract
Terpene synthases (TPS) are a key family of enzymes responsible for generating the diverse terpene scaffolds that underpin many natural products, including front-line anticancer drugs such as Taxol. However, de novo TPS design through directed evolution is costly and slow. We introduce TpsGPT, a generative model for scalable TPS protein design, built by fine-tuning the protein language model ProtGPT2 on 79k TPS sequences mined from UniProt. TpsGPT generated de novo enzyme candidates in silico and we evaluated them using multiple validation metrics, including EnzymeExplorer classification, ESMFold structural confidence (pLDDT), sequence diversity, CLEAN classification, InterPro domain detection, and Foldseek structure alignment. From an initial pool of 28k generated sequences, we identified seven putative TPS enzymes that satisfied all validation criteria. Experimental validation confirmed TPS enzymatic activity in at least two of these sequences. Our results show that fine-tuning of a protein language model on a carefully curated, enzyme-class-specific dataset, combined with rigorous filtering, can enable the de novo generation of functional, evolutionarily distant enzymes.
Problem

Research questions and friction points this paper is trying to address.

Generates novel terpene synthase enzymes computationally
Overcomes costly and slow directed evolution methods
Validates enzyme function through multi-metric and experimental testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned protein language model for enzyme design
Multi-metric computational validation of generated sequences
De novo creation of functional terpene synthases
🔎 Similar Papers
No similar papers found.
H
Hamsini Ramanathan
Seattle Academy of Arts and Sciences (SAAS), Seattle
R
Roman Bushuiev
Czech Institute of Informatics, Robotics and Cybernetics (CIIRC), Czech Technical University
M
Matouš Soldát
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences
J
Jiří Kohout
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences
T
Téo Hebra
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences
J
Joshua David Smith
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences
Josef Sivic
Josef Sivic
Czech Technical University, CIIRC, ELLIS Unit Prague
computer visionmachine learning
T
Tomáš Pluskal
Institute of Organic Chemistry and Biochemistry of the Czech Academy of Sciences