🤖 AI Summary
Existing AI-based protein design methods primarily rely on sequence and structural information, overlooking the vast reservoir of functional knowledge encoded in textual descriptions. This work introduces ProteinDT, a novel multimodal framework that pioneers the integration of human-written protein functional text into end-to-end protein design, establishing a three-stage paradigm: cross-modal alignment, text-driven representation generation, and autoregressive sequence decoding. We construct SwissProtCLAP—the first large-scale text-protein paired dataset (441K pairs)—and propose ProteinCLAP, a model enabling fine-grained semantic alignment between textual descriptions and protein representations. ProteinDT supports zero-shot, function-guided editing and high-fidelity generation: it achieves >90% text-guided generation accuracy and attains state-of-the-art performance on 4 of 6 property prediction benchmarks, while outperforming all baselines across 12 zero-shot editing tasks.
📝 Abstract
Current AI-assisted protein design mainly utilizes protein sequential and structural information. Meanwhile, there exists tremendous knowledge curated by humans in the text format describing proteins' high-level functionalities. Yet, whether the incorporation of such text data can help protein design tasks has not been explored. To bridge this gap, we propose ProteinDT, a multi-modal framework that leverages textual descriptions for protein design. ProteinDT consists of three subsequent steps: ProteinCLAP which aligns the representation of two modalities, a facilitator that generates the protein representation from the text modality, and a decoder that creates the protein sequences from the representation. To train ProteinDT, we construct a large dataset, SwissProtCLAP, with 441K text and protein pairs. We quantitatively verify the effectiveness of ProteinDT on three challenging tasks: (1) over 90% accuracy for text-guided protein generation; (2) best hit ratio on 12 zero-shot text-guided protein editing tasks; (3) superior performance on four out of six protein property prediction benchmarks.