🤖 AI Summary
In ranking tasks, items with similar scores are highly sensitive to minor perturbations under noisy data, causing severe rank instability; existing theoretical analyses rely on stringent separability assumptions that are often violated in practice. To address this, we propose the first distribution-agnostic ranking stability framework that imposes no assumptions on data distribution or candidate set size. We introduce two novel robust ranking operators—“inflated top-k” and “inflated full ranking”—which output controllable sets to guarantee strong stability. By integrating robust statistics with combinatorial decision theory, our approach eliminates dependence on score margins. We theoretically establish stability bounds independent of the number of candidates, enabling scalability to large-scale settings. Empirical evaluation on real-world datasets demonstrates substantial improvements in robustness while preserving information fidelity.
📝 Abstract
In this work, we consider ranking problems among a finite set of candidates: for instance, selecting the top-$k$ items among a larger list of candidates or obtaining the full ranking of all items in the set. These problems are often unstable, in the sense that estimating a ranking from noisy data can exhibit high sensitivity to small perturbations. Concretely, if we use data to provide a score for each item (say, by aggregating preference data over a sample of users), then for two items with similar scores, small fluctuations in the data can alter the relative ranking of those items. Many existing theoretical results for ranking problems assume a separation condition to avoid this challenge, but real-world data often contains items whose scores are approximately tied, limiting the applicability of existing theory. To address this gap, we develop a new algorithmic stability framework for ranking problems, and propose two novel ranking operators for achieving stable ranking: the emph{inflated top-$k$} for the top-$k$ selection problem and the emph{inflated full ranking} for ranking the full list. To enable stability, each method allows for expressing some uncertainty in the output. For both of these two problems, our proposed methods provide guaranteed stability, with no assumptions on data distributions and no dependence on the total number of candidates to be ranked. Experiments on real-world data confirm that the proposed methods offer stability without compromising the informativeness of the output.