Adaptive Sample Sharing for Linear Regression

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the scarcity of labeled data for target tasks in commercial settings—where supervised learning is severely constrained—this paper investigates adaptive sample sharing under ridge regression, leveraging auxiliary data to enhance target prediction performance while rigorously avoiding negative transfer. We propose a transfer-gain-estimation-based adaptive sample borrowing criterion, which dynamically determines the optimal number of borrowed samples via Gaussian feature modeling and data-driven strategies. Under finite-sample conditions, we derive a theoretical error bound and explicitly characterize the relationship between dataset similarity and performance gain. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms single-task training and strong baselines, while consistently preventing negative transfer.

Technology Category

Application Category

📝 Abstract

In many business settings, task-specific labeled data are scarce or costly to obtain, which limits supervised learning on a specific task. To address this challenge, we study sample sharing in the case of ridge regression: leveraging an auxiliary data set while explicitly protecting against negative transfer. We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set. The rule is based on an estimate of the transfer gain i.e. the marginal reduction in the predictive error. Building on this estimator, we derive finite-sample guaranties: under standard conditions, the procedure borrows when it improves parameter estimation and abstains otherwise. In the Gaussian feature setting, we analyze which data set properties ensure that borrowing samples reduces the predictive error. We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training while avoiding negative transfer.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarce labeled data in business settings

Develops adaptive sample sharing for ridge regression

Prevents negative transfer while leveraging auxiliary datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive sample sharing for linear regression

Data-driven rule estimates transfer gain

Avoids negative transfer with finite-sample guarantees

🔎 Similar Papers

Improving the Weighting Strategy in KernelSHAP