🤖 AI Summary
This study addresses the challenge of covariance estimation in transcriptome-wide association studies (TWAS) involving multi-source vector-valued responses, particularly when the underlying parameter structure—such as sparsity/density or low-rank/non-low-rank—is unknown. The authors propose a linear shrinkage estimator grounded in empirical Bayes methodology, which effectively integrates high-dimensional multi-response regression information by applying both global and local shrinkage to the singular values of the data matrix. This approach transcends the limitations of conventional assumptions that enforce either sparsity or low-rank structures, offering enhanced flexibility and computational scalability. Under a specific loss function, the estimator achieves asymptotic optimality. Both numerical simulations and real-data analyses using GTEx TWAS data demonstrate its superior estimation accuracy and practical utility.
📝 Abstract
Motivated by applications in tissue-wide association studies (TWAS), we develop a flexible and theoretically grounded empirical Bayes approach for integrating %vector-valued outcomes data obtained from different sources. We propose a linear shrinkage estimator that effectively shrinks singular values of a data matrix. This problem is closely connected to estimating covariance matrices under a specific loss, for which we develop asymptotically optimal estimators. The basic linear shrinkage estimator is then extended to a local linear shrinkage estimator, offering greater flexibility. Crucially, the proposed method works under sparse/dense or low-rank/non low-rank parameter settings unlike well-known sparse or reduced rank estimators in the literature. Furthermore, the empirical Bayes approach offers greater scalability in computation compared to intensive full Bayes procedures. The method is evaluated through an extensive set of numerical experiments, and applied to a real TWAS data obtained from the Genotype-Tissue Expression (GTEx) project.