AbRank: A Benchmark Dataset and Metric-Learning Framework for Antibody-Antigen Affinity Ranking

📅 2025-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Antibody–antigen binding affinity prediction is hindered by experimental noise, condition heterogeneity, and poor generalization. To address these challenges, we propose AbRank—a benchmark framework that reformulates affinity prediction as a pairwise ranking task. AbRank integrates over 380,000 heterogeneous experimental measurements and introduces a standardized data split with systematic distribution shifts. We innovatively design an *m*-trustworthy ranking mechanism to filter out samples with negligible affinity differences and establish, for the first time, a rigorous generalization evaluation protocol for both *novel antibodies* and *novel antigens*. Our model, WALLE-Affinity, combines protein language model (PLM) embeddings with 3D structural representations via a graph neural network and employs metric learning to optimize ranking performance. Experiments demonstrate that existing methods suffer significant degradation under realistic generalization settings, whereas AbRank substantially improves model robustness and cross-target transferability—offering a scalable, structure-aware paradigm for antibody drug design.

Technology Category

Application Category

📝 Abstract
Accurate prediction of antibody-antigen (Ab-Ag) binding affinity is essential for therapeutic design and vaccine development, yet the performance of current models is limited by noisy experimental labels, heterogeneous assay conditions, and poor generalization across the vast antibody and antigen sequence space. We introduce AbRank, a large-scale benchmark and evaluation framework that reframes affinity prediction as a pairwise ranking problem. AbRank aggregates over 380,000 binding assays from nine heterogeneous sources, spanning diverse antibodies, antigens, and experimental conditions, and introduces standardized data splits that systematically increase distribution shift, from local perturbations such as point mutations to broad generalization across novel antigens and antibodies. To ensure robust supervision, AbRank defines an m-confident ranking framework by filtering out comparisons with marginal affinity differences, focusing training on pairs with at least an m-fold difference in measured binding strength. As a baseline for the benchmark, we introduce WALLE-Affinity, a graph-based approach that integrates protein language model embeddings with structural information to predict pairwise binding preferences. Our benchmarks reveal significant limitations in current methods under realistic generalization settings and demonstrate that ranking-based training improves robustness and transferability. In summary, AbRank offers a robust foundation for machine learning models to generalize across the antibody-antigen space, with direct relevance for scalable, structure-aware antibody therapeutic design.
Problem

Research questions and friction points this paper is trying to address.

Predict antibody-antigen binding affinity accurately
Overcome noisy labels and poor generalization
Standardize evaluation for diverse antibodies and antigens
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise ranking framework for affinity prediction
Standardized data splits for systematic evaluation
Graph-based model integrating embeddings and structure
🔎 Similar Papers
No similar papers found.
C
Chunan Liu
Structural Molecular Biology, Division of Biosciences, University College London, United Kingdom
Aurelien Pelissier
Aurelien Pelissier
IBM Research Zurich, ETH Zurich
AIComputational BiologyPhysics
Y
Yanjun Shao
Biomedical Informatics and Data Science, Yale School of Medicine, United States
L
Lilian Denzler
Structural Molecular Biology, Division of Biosciences, University College London, United Kingdom; Biomedical Informatics and Data Science, Yale School of Medicine, United States
A
Andrew C. R. Martin
Structural Molecular Biology, Division of Biosciences, University College London, United Kingdom
Brooks Paige
Brooks Paige
Associate Professor, University College London
Machine LearningStatistics
M
Mariia Rodriguez Martinez
Biomedical Informatics and Data Science, Yale School of Medicine, United States