MotifBench: A standardized protein design benchmark for motif-scaffolding problems

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Protein design faces the “motif–scaffold matching” challenge—identifying diverse protein backbones that precisely accommodate and preserve a given geometric motif’s conformation. Current evaluation practices lack standardization, hindering reproducibility and cross-method comparison. Method: We introduce MotifBench, the first standardized benchmark for this task. It comprises 30 high-difficulty test cases—including instances with known solutions that all mainstream methods fail—under a rigorously defined, fully reproducible evaluation protocol. Our framework integrates AlphaFold2-based structure prediction, fixed-backbone sequence design, explicit geometric constraint modeling, and multi-dimensional structural and sequence metrics. Contribution/Results: MotifBench enables fair, quantitative comparison across methods. We open-source all code, data, and a live leaderboard. For the first time, it systematically exposes critical limitations of state-of-the-art approaches, providing a robust foundation for advancing motif-driven protein design.

Technology Category

Application Category

📝 Abstract
The motif-scaffolding problem is a central task in computational protein design: Given the coordinates of atoms in a geometry chosen to confer a desired biochemical function (a motif), the task is to identify diverse protein structures (scaffolds) that include the motif and maintain its geometry. Significant recent progress on motif-scaffolding has been made due to computational evaluation with reliable protein structure prediction and fixed-backbone sequence design methods. However, significant variability in evaluation strategies across publications has hindered comparability of results, challenged reproducibility, and impeded robust progress. In response we introduce MotifBench, comprising (1) a precisely specified pipeline and evaluation metrics, (2) a collection of 30 benchmark problems, and (3) an implementation of this benchmark and leaderboard at github.com/blt2114/MotifBench. The MotifBench test cases are more difficult compared to earlier benchmarks, and include protein design problems for which solutions are known but on which, to the best of our knowledge, state-of-the-art methods fail to identify any solution.
Problem

Research questions and friction points this paper is trying to address.

Standardizes protein design benchmarking
Addresses motif-scaffolding variability
Introduces challenging test cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Standardized pipeline for protein design
Collection of 30 benchmark problems
Implementation of benchmark and leaderboard
🔎 Similar Papers
No similar papers found.
Z
Zhuoqi Zheng
Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University
B
Bo Zhang
Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University; School of Life Sciences, Tsinghua University
Kieran Didi
Kieran Didi
NVIDIA, Oxford University
Machine LearningProtein DesignArtificial IntelligenceDrug Design
Kevin K. Yang
Kevin K. Yang
Microsoft Research
machine learningprotein engineeringcomputational biology
Jason Yim
Jason Yim
MIT
Machine learningcomputational biologycomputer science
J
Joseph L. Watson
Microsoft Research
H
Hai-Feng Chen
Department of Bioinformatics and Biostatistics, Shanghai Jiao Tong University
B
Brian L. Trippe
Department of Statistics, Stanford University; Stanford Data Science, Stanford University