ProtAlign: Contrastive learning paradigm for Sequence and structure alignment

📅 2026-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing protein language models often neglect structural information, and conventional approaches struggle to effectively align sequence and structure representations. This work proposes a sequence–structure contrastive alignment framework that, for the first time, introduces contrastive learning to this task by constructing a shared embedding space to achieve consistent representations across modalities. Leveraging large-scale paired sequence–structure data, the method enables cross-modal retrieval and establishes interpretable sequence–structure correspondences. It significantly outperforms current state-of-the-art approaches on downstream tasks such as functional annotation and stability prediction, and efficiently retrieves structurally similar neighbors directly from sequence inputs.

Technology Category

Application Category

📝 Abstract
Protein language models often take into consideration the alignment between a protein sequence and its textual description. However, they do not take structural information into consideration. Traditional methods treat sequence and structure separately, limiting the ability to exploit the alignment between the structure and protein sequence embeddings. In this paper, we introduce a sequence structure contrastive alignment framework, which learns a shared embedding space where proteins are represented consistently across modalities. By training on large-scale pairs of sequences and experimentally resolved or predicted structures, the model maximizes agreement between matched sequence structure pairs while pushing apart unrelated pairs. This alignment enables cross-modal retrieval (e.g., finding structural neighbors given a sequence), improves downstream prediction tasks such as function annotation and stability estimation, and provides interpretable links between sequence variation and structural organization. Our results demonstrate that contrastive learning can serve as a powerful bridge between protein sequences and structures, offering a unified representation for understanding and engineering proteins.
Problem

Research questions and friction points this paper is trying to address.

protein sequence
protein structure
sequence-structure alignment
multimodal representation
contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

contrastive learning
protein sequence-structure alignment
shared embedding space
cross-modal retrieval
protein language models
🔎 Similar Papers
No similar papers found.
A
Aditya Ranganath
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
H
Hasin Us Sami
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory
Kowshik Thopalli
Kowshik Thopalli
Ph.D. Student, Arizona State University
computer visionmachine learningdeep learningartificial intelligencedifferential geometry
Bhavya Kailkhura
Bhavya Kailkhura
Research Scientist, Lawrence Livermore National Laboratory
AI Security & AlignmentCompressed & Fast AI
W
Wesam Sakla
Center for Applied Scientific Computing, Lawrence Livermore National Laboratory