🤖 AI Summary
This study addresses the statistical challenge of high-dimensional two-sample mean comparison in single-cell transcriptomic data. Methodologically, we propose a novel differential detection framework that integrates dimensionality reduction with debiasing: we introduce the “projected null hypothesis” and an “anchored projection” strategy to adaptively capture local signal structures via low-dimensional projection, coupled with semiparametric double machine learning (DML) for unbiased and efficient inference. The framework rigorously controls Type I error under the null while substantially improving statistical power for sparse and localized differential expression patterns. Moreover, it enables interpretable localization of differentially expressed gene regions. In extensive simulations and real single-cell datasets, our method outperforms existing approaches in both statistical power and biological interpretability.
📝 Abstract
We study several variants of the high-dimensional mean inference problem motivated by modern single-cell genomics data. By taking advantage of low-dimensional and localized signal structures commonly seen in such data, our proposed methods not only have the usual frequentist validity but also provide useful information on the potential locations of the signal if the null hypothesis is rejected. Our method adaptively projects the high-dimensional vector onto a low-dimensional space, followed by a debiasing step using the semiparametric double-machine learning framework. Our analysis shows that debiasing is unnecessary under the global null, but necessary under a ``projected null'' that is of scientific interest. We also propose an ``anchored projection'' to maximize the power while avoiding the degeneracy issue under the null. Experiments on synthetic data and a real single-cell sequencing dataset demonstrate the effectiveness and interpretability of our methods.