DisProtEdit: Exploring Disentangled Representations for Multi-Attribute Protein Editing

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Controlling protein structure and functional attribute editing remains challenging due to entangled structural and semantic representations. Method: We propose DisProtEdit, a controllable protein editing framework that explicitly decouples structural and functional semantics via a dual-channel natural language-supervised learning paradigm, integrating contrastive alignment, uniformity regularization, and structure–function disentanglement constraints. We introduce SwissProtDis—the first large-scale dual-description protein dataset—and incorporate an LLM-driven text decomposition module with conditional latent-space decoding. Contribution/Results: Evaluated on a newly constructed multi-attribute editing benchmark, DisProtEdit achieves a 61.7% dual-hit rate, significantly improving edit controllability and representation interpretability while preserving high editing fidelity. This work establishes a novel, interpretable paradigm for controllable protein design.

Technology Category

Application Category

📝 Abstract

We introduce DisProtEdit, a controllable protein editing framework that leverages dual-channel natural language supervision to learn disentangled representations of structural and functional properties. Unlike prior approaches that rely on joint holistic embeddings, DisProtEdit explicitly separates semantic factors, enabling modular and interpretable control. To support this, we construct SwissProtDis, a large-scale multimodal dataset where each protein sequence is paired with two textual descriptions, one for structure and one for function, automatically decomposed using a large language model. DisProtEdit aligns protein and text embeddings using alignment and uniformity objectives, while a disentanglement loss promotes independence between structural and functional semantics. At inference time, protein editing is performed by modifying one or both text inputs and decoding from the updated latent representation. Experiments on protein editing and representation learning benchmarks demonstrate that DisProtEdit performs competitively with existing methods while providing improved interpretability and controllability. On a newly constructed multi-attribute editing benchmark, the model achieves a both-hit success rate of up to 61.7%, highlighting its effectiveness in coordinating simultaneous structural and functional edits.

Problem

Research questions and friction points this paper is trying to address.

Controllable protein editing using disentangled representations

Separating structural and functional properties for modular control

Improving interpretability in multi-attribute protein editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-channel natural language supervision for disentanglement

Alignment and uniformity objectives for embedding alignment

Modular control via separate structural and functional edits

🔎 Similar Papers

GOProteinGNN: Leveraging Protein Knowledge Graphs for Protein Representation Learning