🤖 AI Summary
To address the scarcity of labeled data for target tasks in commercial settings—where supervised learning is severely constrained—this paper investigates adaptive sample sharing under ridge regression, leveraging auxiliary data to enhance target prediction performance while rigorously avoiding negative transfer. We propose a transfer-gain-estimation-based adaptive sample borrowing criterion, which dynamically determines the optimal number of borrowed samples via Gaussian feature modeling and data-driven strategies. Under finite-sample conditions, we derive a theoretical error bound and explicitly characterize the relationship between dataset similarity and performance gain. Experiments on both synthetic and real-world datasets demonstrate that our method significantly outperforms single-task training and strong baselines, while consistently preventing negative transfer.
📝 Abstract
In many business settings, task-specific labeled data are scarce or costly to obtain, which limits supervised learning on a specific task. To address this challenge, we study sample sharing in the case of ridge regression: leveraging an auxiliary data set while explicitly protecting against negative transfer. We introduce a principled, data-driven rule that decides how many samples from an auxiliary dataset to add to the target training set. The rule is based on an estimate of the transfer gain i.e. the marginal reduction in the predictive error. Building on this estimator, we derive finite-sample guaranties: under standard conditions, the procedure borrows when it improves parameter estimation and abstains otherwise. In the Gaussian feature setting, we analyze which data set properties ensure that borrowing samples reduces the predictive error. We validate the approach in synthetic and real datasets, observing consistent gains over strong baselines and single-task training while avoiding negative transfer.