🤖 AI Summary
This paper addresses asymptotic statistical inference for high-dimensional regression parameters subject to affine constraints, motivated by genetic applications: estimating the causal effect of genetic factors on a continuous diabetes phenotype using protein expression as a mediator, where the effect must satisfy linear constraints derived from protein genetic determinants and protein–phenotype genetic associations. We propose a class of convex optimization–based constrained estimators. Within a proportional asymptotic regime, we establish, for the first time, an asymptotically normal theory with sharp large-sample optimality—rigorously characterizing the bias–variance trade-off while ensuring consistency, optimal convergence rates, and valid confidence intervals. Our method explicitly incorporates external biological priors (e.g., protein-mediated pathways). In both simulations and real genetic data, it substantially outperforms unconstrained benchmarks, achieving both theoretical rigor and numerical stability.
📝 Abstract
We consider statistical inference in high-dimensional regression problems under affine constraints on the parameter space. The theoretical study of this is motivated by the study of genetic determinants of diseases, such as diabetes, using external information from mediating protein expression levels. Specifically, we develop rigorous methods for estimating genetic effects on diabetes-related continuous outcomes when these associations are constrained based on external information about genetic determinants of proteins, and genetic relationships between proteins and the outcome of interest. In this regard, we discuss multiple candidate estimators and study their theoretical properties, sharp large sample optimality, and numerical qualities under a high-dimensional proportional asymptotic framework.