🤖 AI Summary
Low-homology and orphan proteins suffer from sparse multiple sequence alignment (MSA) information, leading to degraded structural prediction performance. To address this, we propose PLAME: a lightweight adapter that leverages pre-trained protein language models to extract evolutionary embeddings and enhance MSA generation quality; introduces a joint conservation-diversity loss to optimize MSA generation; and incorporates a novel MSA filtering mechanism with a sequence-quality assessment metric. PLAME bridges the gap between ESMFold’s inference speed and AlphaFold2’s prediction accuracy. It achieves state-of-the-art (SOTA) improvements in folding accuracy on both AlphaFold2/3 low-homology and orphan protein benchmarks. Ablation studies confirm the efficacy of each component, and case analyses reveal interpretable correlations between MSA features—such as conservation patterns and diversity metrics—and final structural quality.
📝 Abstract
Protein structure prediction is essential for drug discovery and understanding biological functions. While recent advancements like AlphaFold have achieved remarkable accuracy, most folding models rely heavily on multiple sequence alignments (MSAs) to boost prediction performance. This dependency limits their effectiveness on low-homology proteins and orphan proteins, where MSA information is sparse or unavailable. To address this limitation, we propose PLAME, a novel MSA design model that leverages evolutionary embeddings from pretrained protein language models. Unlike existing methods, PLAME introduces pretrained representations to enhance evolutionary information and employs a conservation-diversity loss to enhance generation quality. Additionally, we propose a novel MSA selection method to effectively screen high-quality MSAs and improve folding performance. We also propose a sequence quality assessment metric that provides an orthogonal perspective to evaluate MSA quality. On the AlphaFold2 benchmark of low-homology and orphan proteins, PLAME achieves state-of-the-art performance in folding enhancement and sequence quality assessment, with consistent improvements demonstrated on AlphaFold3. Ablation studies validate the effectiveness of the MSA selection method, while extensive case studies on various protein types provide insights into the relationship between AlphaFold's prediction quality and MSA characteristics. Furthermore, we demonstrate that PLAME can serve as an adapter achieving AlphaFold2-level accuracy with the ESMFold's inference speed.