What Is the Optimal Ranking Score Between Precision and Recall? We Can Always Find It and It Is Rarely $F_1$

📅 2025-11-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In classification tasks, conventional F₁ and Fβ scores fail to achieve optimal precision–recall trade-offs under arbitrary performance distributions. This paper identifies the theoretical suboptimality of Fβ for rank-based compromise and proposes a novel framework optimizing Kendall rank correlation instead. We derive a closed-form solution for the optimal β that adapts to any precision–recall distribution, integrating rank-correlation modeling, weighted harmonic mean analysis, and shortest-path ranking theory—thereby eliminating reliance on fixed β values. Across six empirical studies, the computed optimal β consistently deviates significantly from 1 (mean deviation: 0.35), yielding stable performance improvements over standard F₁. The work provides both theoretical grounding and practical tools for threshold selection in classification models, with open-sourced, reproducible software.

Technology Category

Application Category

📝 Abstract
Ranking methods or models based on their performance is of prime importance but is tricky because performance is fundamentally multidimensional. In the case of classification, precision and recall are scores with probabilistic interpretations that are both important to consider and complementary. The rankings induced by these two scores are often in partial contradiction. In practice, therefore, it is extremely useful to establish a compromise between the two views to obtain a single, global ranking. Over the last fifty years or so,it has been proposed to take a weighted harmonic mean, known as the F-score, F-measure, or $F_β$. Generally speaking, by averaging basic scores, we obtain a score that is intermediate in terms of values. However, there is no guarantee that these scores lead to meaningful rankings and no guarantee that the rankings are good tradeoffs between these base scores. Given the ubiquity of $F_β$ scores in the literature, some clarification is in order. Concretely: (1) We establish that $F_β$-induced rankings are meaningful and define a shortest path between precision- and recall-induced rankings. (2) We frame the problem of finding a tradeoff between two scores as an optimization problem expressed with Kendall rank correlations. We show that $F_1$ and its skew-insensitive version are far from being optimal in that regard. (3) We provide theoretical tools and a closed-form expression to find the optimal value for $β$ for any distribution or set of performances, and we illustrate their use on six case studies.
Problem

Research questions and friction points this paper is trying to address.

Finding optimal tradeoff between precision and recall rankings
Demonstrating F1 score is suboptimal for ranking compromise
Providing method to compute optimal beta for F-beta score
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fβ rankings define shortest path between precision and recall
Framed tradeoff as optimization problem using Kendall correlations
Provided closed-form expression to find optimal β for distributions
🔎 Similar Papers
No similar papers found.