Benchmark for Antibody Binding Affinity Maturation and Design

📅 2025-05-23

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Current antibody affinity evaluation methods typically analyze antibody sequences or structures in isolation, lacking a unified benchmark that treats the antibody–antigen (Ab–Ag) complex as the functional unit and reflects true binding capability. To address this, we propose AbBiBench—the first function-oriented evaluation framework grounded in complex likelihood estimation, breaking from conventional single-antibody assessment paradigms. AbBiBench integrates masked language modeling, autoregressive generation, inverse folding, diffusion-based structure generation, and geometric graph neural networks, jointly scoring candidates across experimental affinity, structural integrity, and biophysical properties. We systematically evaluate 14 state-of-the-art models on a benchmark comprising 9 antigens and 156,000 antibody variants. Results show that structure-conditioned inverse folding models achieve top performance. In an H1N1 antibody design case study, AbBiBench demonstrates strong predictive validity: model-derived complex likelihood correlates significantly with experimental dissociation constants (K<sub>D</sub>; Pearson *r* = 0.72).

Technology Category

Application Category

📝 Abstract

We introduce AbBiBench (Antibody Binding Benchmarking), a benchmarking framework for antibody binding affinity maturation and design. Unlike existing antibody evaluation strategies that rely on antibody alone and its similarity to natural ones (e.g., amino acid identity rate, structural RMSD), AbBiBench considers an antibody-antigen (Ab-Ag) complex as a functional unit and evaluates the potential of an antibody design binding to given antigen by measuring protein model's likelihood on the Ab-Ag complex. We first curate, standardize, and share 9 datasets containing 9 antigens (involving influenza, anti-lysozyme, HER2, VEGF, integrin, and SARS-CoV-2) and 155,853 heavy chain mutated antibodies. Using these datasets, we systematically compare 14 protein models including masked language models, autoregressive language models, inverse folding models, diffusion-based generative models, and geometric graph models. The correlation between model likelihood and experimental affinity values is used to evaluate model performance. Additionally, in a case study to increase binding affinity of antibody F045-092 to antigen influenza H1N1, we evaluate the generative power of the top-performing models by sampling a set of new antibodies binding to the antigen and ranking them based on structural integrity and biophysical properties of the Ab-Ag complex. As a result, structure-conditioned inverse folding models outperform others in both affinity correlation and generation tasks. Overall, AbBiBench provides a unified, biologically grounded evaluation framework to facilitate the development of more effective, function-aware antibody design models.

Problem

Research questions and friction points this paper is trying to address.

Benchmarking antibody binding affinity maturation and design

Evaluating antibody-antigen complexes using protein models

Comparing 15 protein models for antibody design effectiveness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates antibody-antigen complexes as fundamental units

Systematically compares 15 protein models using curated data

Structure-conditioned inverse folding models outperform other approaches

🔎 Similar Papers

Binding Affinity Prediction: From Conventional to Machine Learning-Based Approaches