CMADiff: Cross-Modal Aligned Diffusion for Controllable Protein Generation

📅 2025-03-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI-based protein design models predominantly rely on sequence or structural data, neglecting explicit modeling of physicochemical properties and lacking controllability under intuitive semantic conditions. To address this, we propose BioDiff—a novel framework that, for the first time, achieves three-modal alignment among protein physicochemical attributes, natural language descriptions, and amino acid sequences within a diffusion modeling paradigm. BioDiff integrates a conditional variational autoencoder (CVAE) with a conditional diffusion model and introduces BioAligner, a contrastive alignment module that jointly embeds textual semantics and quantitative physicochemical features—including hydrophobicity, net charge, and molecular weight. Generated proteins are rigorously validated using AlphaFold3 for structural plausibility and benchmarked across multiple metrics. Results demonstrate superior physicochemical validity and structural foldability compared to state-of-the-art methods. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
AI-assisted protein design has emerged as a critical tool for advancing biotechnology, as deep generative models have demonstrated their reliability in this domain. However, most existing models primarily utilize protein sequence or structural data for training, neglecting the physicochemical properties of proteins.Moreover, they are deficient to control the generation of proteins in intuitive conditions. To address these limitations,we propose CMADiff here, a novel framework that enables controllable protein generation by aligning the physicochemical properties of protein sequences with text-based descriptions through a latent diffusion process. Specifically, CMADiff employs a Conditional Variational Autoencoder (CVAE) to integrate physicochemical features as conditional input, forming a robust latent space that captures biological traits. In this latent space, we apply a conditional diffusion process, which is guided by BioAligner, a contrastive learning-based module that aligns text descriptions with protein features, enabling text-driven control over protein sequence generation. Validated by a series of evaluations including AlphaFold3, the experimental results indicate that CMADiff outperforms protein sequence generation benchmarks and holds strong potential for future applications. The implementation and code are available at https://github.com/HPC-NEAU/PhysChemDiff.
Problem

Research questions and friction points this paper is trying to address.

Generates proteins with controlled physicochemical properties
Aligns text descriptions with protein features for control
Improves protein sequence generation using cross-modal diffusion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aligns physicochemical properties with text descriptions
Uses Conditional Variational Autoencoder for feature integration
Employs BioAligner for text-driven protein generation control
🔎 Similar Papers
No similar papers found.
C
Changjian Zhou
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China
Y
Yuexi Qiu
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China; School of Electrical and Information, Northeast Agricultural University, Harbin, 150030, China
T
Tongtong Ling
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China
J
Jiafeng Li
Galileo Financial Technologies, Sandy, Utah, USA
S
Shuanghe Liu
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China
X
Xiangjing Wang
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China
Jia Song
Jia Song
Assistant Professor, University of Idaho
Cybersecurity
W
Wensheng Xiang
Key Laboratory of Agricultural Microbiology of Heilongjiang Province, Northeast Agricultural University, Harbin, 150030, China