🤖 AI Summary
Classical bias–variance decomposition is restricted to squared error loss, limiting its applicability in statistical learning where non-quadratic losses (e.g., log loss, exponential loss) are common.
Method: Leveraging convex analysis and statistical decision theory, we generalize the decomposition to arbitrary Bregman divergences as prediction errors, deriving a rigorous formula for maximum likelihood estimators under exponential family distributions and specifying precise conditions for its validity.
Contribution/Results: Our framework unifies previously fragmented results for specific losses, filling a fundamental theoretical gap. It enhances interpretability and pedagogical utility of the bias–variance trade-off, and provides a principled, general-purpose tool for model diagnostics and generalization analysis under non-squared error settings—enabling coherent error decomposition across diverse loss functions grounded in information geometry.
📝 Abstract
The bias-variance decomposition is a central result in statistics and machine learning, but is typically presented only for the squared error. We present a generalization of the bias-variance decomposition where the prediction error is a Bregman divergence, which is relevant to maximum likelihood estimation with exponential families. While the result is already known, there was not previously a clear, standalone derivation, so we provide one for pedagogical purposes. A version of this note previously appeared on the author's personal website without context. Here we provide additional discussion and references to the relevant prior literature.