🤖 AI Summary
Antibody–antigen binding affinity prediction is hindered by experimental noise, condition heterogeneity, and poor generalization. To address these challenges, we propose AbRank—a benchmark framework that reformulates affinity prediction as a pairwise ranking task. AbRank integrates over 380,000 heterogeneous experimental measurements and introduces a standardized data split with systematic distribution shifts. We innovatively design an *m*-trustworthy ranking mechanism to filter out samples with negligible affinity differences and establish, for the first time, a rigorous generalization evaluation protocol for both *novel antibodies* and *novel antigens*. Our model, WALLE-Affinity, combines protein language model (PLM) embeddings with 3D structural representations via a graph neural network and employs metric learning to optimize ranking performance. Experiments demonstrate that existing methods suffer significant degradation under realistic generalization settings, whereas AbRank substantially improves model robustness and cross-target transferability—offering a scalable, structure-aware paradigm for antibody drug design.
📝 Abstract
Accurate prediction of antibody-antigen (Ab-Ag) binding affinity is essential for therapeutic design and vaccine development, yet the performance of current models is limited by noisy experimental labels, heterogeneous assay conditions, and poor generalization across the vast antibody and antigen sequence space. We introduce AbRank, a large-scale benchmark and evaluation framework that reframes affinity prediction as a pairwise ranking problem. AbRank aggregates over 380,000 binding assays from nine heterogeneous sources, spanning diverse antibodies, antigens, and experimental conditions, and introduces standardized data splits that systematically increase distribution shift, from local perturbations such as point mutations to broad generalization across novel antigens and antibodies. To ensure robust supervision, AbRank defines an m-confident ranking framework by filtering out comparisons with marginal affinity differences, focusing training on pairs with at least an m-fold difference in measured binding strength. As a baseline for the benchmark, we introduce WALLE-Affinity, a graph-based approach that integrates protein language model embeddings with structural information to predict pairwise binding preferences. Our benchmarks reveal significant limitations in current methods under realistic generalization settings and demonstrate that ranking-based training improves robustness and transferability. In summary, AbRank offers a robust foundation for machine learning models to generalize across the antibody-antigen space, with direct relevance for scalable, structure-aware antibody therapeutic design.