Distribution-dependent Generalization Bounds for Tuning Linear Regression Across Tasks

📅 2025-07-07

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses joint hyperparameter tuning of regularization penalties—specifically ℓ₁, ℓ₂, and elastic net—in multi-task linear regression, with emphasis on generalization error control in high-dimensional settings. To overcome the dimensional degradation inherent in distribution-free generalization bounds (which scale poorly with dimension $d$), the authors derive a data-dependent upper bound on the validation loss generalization error under sub-Gaussian design assumptions. Crucially, this bound is dimension-independent and substantially tighter than existing uniform bounds. The analysis is further extended to generalized ridge regression with mean estimation, yielding an even sharper bound. The key contribution is the first dimension-free, distribution-adaptive generalization guarantee for multi-task learning—breaking from the conventional distribution-free paradigm—while simultaneously ensuring effective overfitting mitigation and sparse variable selection.

Technology Category

Application Category

📝 Abstract

Modern regression problems often involve high-dimensional data and a careful tuning of the regularization hyperparameters is crucial to avoid overly complex models that may overfit the training data while guaranteeing desirable properties like effective variable selection. We study the recently introduced direction of tuning regularization hyperparameters in linear regression across multiple related tasks. We obtain distribution-dependent bounds on the generalization error for the validation loss when tuning the L1 and L2 coefficients, including ridge, lasso and the elastic net. In contrast, prior work develops bounds that apply uniformly to all distributions, but such bounds necessarily degrade with feature dimension, d. While these bounds are shown to be tight for worst-case distributions, our bounds improve with the "niceness" of the data distribution. Concretely, we show that under additional assumptions that instances within each task are i.i.d. draws from broad well-studied classes of distributions including sub-Gaussians, our generalization bounds do not get worse with increasing d, and are much sharper than prior work for very large d. We also extend our results to a generalization of ridge regression, where we achieve tighter bounds that take into account an estimate of the mean of the ground truth distribution.

Problem

Research questions and friction points this paper is trying to address.

Tuning regularization hyperparameters in linear regression across tasks

Generalization error bounds for L1 and L2 regularization

Improved bounds under sub-Gaussian data distribution assumptions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tuning L1 and L2 regularization coefficients

Distribution-dependent generalization error bounds

Improved bounds for sub-Gaussian data distributions

🔎 Similar Papers

OmniPred: Language Models as Universal Regressors