🤖 AI Summary
In transfer learning, historical data are often available only in incomplete forms—such as summary statistics, point estimates, or feature lists—posing significant challenges for high-dimensional sparse regression and feature selection. Method: We propose an empirical Bayes–based structure learning framework that circumvents reliance on complete raw data by estimating hyperparameters from heterogeneous, multi-source summary information. Contribution/Results: Under sparsity and beta-min conditions, our method achieves variable selection consistency with a faster convergence rate than both full Bayesian and mainstream frequentist approaches. It strikes an optimal balance between computational efficiency and statistical accuracy. Extensive experiments on both synthetic and real-world transfer learning scenarios demonstrate that the proposed method consistently outperforms full Bayesian and other benchmark methods in terms of accuracy, robustness, and adaptability—particularly in data-scarce, cross-study integration tasks where access to original datasets is limited or infeasible.
📝 Abstract
We discuss the use of empirical Bayes for data integration, in the sense of transfer learning. Our main interest is in settings where one wishes to learn structure (e.g. feature selection) and one only has access to incomplete data from previous studies, such as summaries, estimates or lists of relevant features. We discuss differences between full Bayes and empirical Bayes, and develop a computational framework for the latter. We discuss how empirical Bayes attains consistent variable selection under weaker conditions (sparsity and betamin assumptions) than full Bayes and other standard criteria do, and how it attains faster convergence rates. Our high-dimensional regression examples show that fully Bayesian inference enjoys excellent properties, and that data integration with empirical Bayes can offer moderate yet meaningful improvements in practice.