Bias-variance decompositions: the exclusive privilege of Bregman divergences

📅 2025-01-30

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

This paper systematically addresses the applicability of bias–variance decomposition in machine learning: *Which loss functions admit a rigorous, interpretable decomposition?* Leveraging convex analysis and functional equation theory, we introduce the *g-Bregman divergence* framework—Bregman divergences composed with invertible variable transformations—and establish the **necessary and sufficient condition** for decomposability: a loss function admits a clean bias–variance decomposition if and only if it belongs to the g-Bregman divergence family. Consequently, we rigorously prove that the Bregman divergence class is the *unique* family supporting such decomposition; moreover, the squared Mahalanobis distance (up to invertible transformation) is the *only symmetric* loss with this property. These results provide a unified theoretical foundation for widely used losses—including squared error and cross-entropy—and clarify why non-standard Bregman-type losses (e.g., certain *f*-divergences) inherently fail to decompose.

Technology Category

Application Category

📝 Abstract

Bias-variance decompositions are widely used to understand the generalization performance of machine learning models. While the squared error loss permits a straightforward decomposition, other loss functions - such as zero-one loss or $L_1$ loss - either fail to sum bias and variance to the expected loss or rely on definitions that lack the essential properties of meaningful bias and variance. Recent research has shown that clean decompositions can be achieved for the broader class of Bregman divergences, with the cross-entropy loss as a special case. However, the necessary and sufficient conditions for these decompositions remain an open question. In this paper, we address this question by studying continuous, nonnegative loss functions that satisfy the identity of indiscernibles under mild regularity conditions. We prove that so-called $g$-Bregman divergences are the only such loss functions that have a clean bias-variance decomposition. A $g$-Bregman divergence can be transformed into a standard Bregman divergence through an invertible change of variables. This makes the squared Mahalanobis distance, up to such a variable transformation, the only symmetric loss function with a clean bias-variance decomposition. We also examine the impact of relaxing the restrictions on the loss functions and how this affects our results.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning Model Evaluation

Complex Loss Functions

g-Bregman Divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

g-Bregman Divergence

Bias-Variance Decomposition

Machine Learning Performance

🔎 Similar Papers

A direct proof of a unified law of robustness for Bregman divergence losses