🤖 AI Summary
The proliferation of foundation models in computational pathology has led to challenges in model selection and inconsistent performance of individual models on downstream tasks, while full fine-tuning of multiple models incurs prohibitive computational costs. To address this, this work proposes LogitProd, a plug-and-play fusion method that integrates heterogeneous pathology foundation models at the logit level. Treating each pre-trained model as a fixed expert, LogitProd adaptively weights their logits per sample and combines them via a weighted product, eliminating the need for encoder retraining or feature alignment. This approach is the first to enable efficient ensemble learning in logit space for heterogeneous pathology models, with theoretical guarantees that the fused performance is no worse than that of the best individual model. Experiments show LogitProd achieves top performance on 20 out of 22 pathology tasks, yielding an average improvement of approximately 3% while requiring only one-twelfth the training cost of feature-level fusion methods.
📝 Abstract
Pathology foundation models (FMs) have become central to computational histopathology, offering strong transfer performance across a wide range of diagnostic and prognostic tasks. The rapid proliferation of pathology foundation models creates a model-selection bottleneck: no single model is uniformly best, yet exhaustively adapting and validating many candidates for each downstream endpoint is prohibitively expensive. We address this challenge with a lightweight and novel model fusion strategy, LogitProd, which treats independently trained FM-based predictors as fixed experts and learns sample-adaptive fusion weights over their slide-level outputs. The fusion operates purely on logits, requiring no encoder retraining and no feature-space alignment across heterogeneous backbones. We further provide a theoretical analysis showing that the optimal weighted product fusion is guaranteed to perform at least as well as the best individual expert under the training objective. We systematically evaluate LogitProd on \textbf{22} benchmarks spanning WSI-level classification, tile-level classification, gene mutation prediction, and discrete-time survival modeling. LogitProd ranks first on 20/22 tasks and improves the average performance across all tasks by ~3% over the strongest single expert. LogitProd enables practitioners to upgrade heterogeneous FM-based pipelines in a plug-and-play manner, achieving multi-expert gains with $\sim$12$\times$ lower training cost than feature-fusion alternatives.