Who is the Winning Algorithm? Rank Aggregation for Comparative Studies

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of accurately estimating the probability that an algorithm will achieve the top rank on a future, unseen dataset, based on its complete rankings across multiple benchmark datasets. Moving beyond conventional approaches that rely solely on win counts, the paper proposes a novel framework that systematically incorporates full ranking information into probabilistic modeling for the first time. By integrating rank aggregation with maximum likelihood estimation, the method leverages total order data to perform principled statistical inference, substantially improving the accuracy of winning probability predictions. Empirical evaluations on both synthetic and real-world datasets demonstrate that the proposed approach consistently outperforms existing techniques and enables more reliable identification of the best-performing algorithm.

Technology Category

Application Category

📝 Abstract
Consider a collection of m competing machine learning algorithms. Given their performance on a benchmark of datasets, we would like to identify the best performing algorithm. Specifically, which algorithm is most likely to ``win''(rank highest) on a future, unseen dataset. The standard maximum likelihood approach suggests counting the number of wins per each algorithm. In this work, we argue that there is much more information in the complete rankings. That is, the number of times that each algorithm finished second, third and so forth. Yet, it is not entirely clear how to effectively utilize this information for our purpose. In this work we introduce a novel conceptual framework for estimating the win probability for each of the m algorithms, given their complete rankings over a benchmark of datasets. Our proposed framework significantly improves upon currently known methods in synthetic and real-world examples.
Problem

Research questions and friction points this paper is trying to address.

rank aggregation
win probability
algorithm comparison
machine learning evaluation
benchmarking
Innovation

Methods, ideas, or system contributions that make the work stand out.

rank aggregation
win probability estimation
algorithm comparison
complete rankings
benchmark evaluation