🤖 AI Summary
This work addresses the challenges of high near-quadratic computational costs and data imbalance in large-scale adaptive immune repertoire comparisons, which often obscure rare clonotypes. The authors propose an end-to-end, efficient, and unbiased inference framework that jointly designs antigen-aware sub-quadratic retrieval, multimodal embedding fusion, and fairness-constrained clustering. By integrating MinHash-based pre-filtering, GPU-accelerated affinity kernels, and a differentiable gating mechanism for adaptive integration of alignment and embedding signals, the method dynamically balances computational efficiency with biological fidelity. Furthermore, an automatic calibration module ensures equitable representation of rare immune subpopulations. Evaluated on viral and tumor immune repertoires, the approach substantially improves throughput and reduces peak memory usage while maintaining or enhancing recall@k, cluster purity, and subgroup fairness.
📝 Abstract
Comparative analysis of adaptive immune repertoires at population scale is hampered by two practical bottlenecks: the near-quadratic cost of pairwise affinity evaluations and dataset imbalances that obscure clinically important minority clonotypes. We introduce SubQuad, an end-to-end pipeline that addresses these challenges by combining antigen-aware, near-subquadratic retrieval with GPU-accelerated affinity kernels, learned multimodal fusion, and fairness-constrained clustering. The system employs compact MinHash prefiltering to sharply reduce candidate comparisons, a differentiable gating module that adaptively weights complementary alignment and embedding channels on a per-pair basis, and an automated calibration routine that enforces proportional representation of rare antigen-specific subgroups. On large viral and tumor repertoires SubQuad achieves measured gains in throughput and peak memory usage while preserving or improving recall@k, cluster purity, and subgroup equity. By co-designing indexing, similarity fusion, and equity-aware objectives, SubQuad offers a scalable, bias-aware platform for repertoire mining and downstream translational tasks such as vaccine target prioritization and biomarker discovery.