Debiased Projected Two-Sample Comparisonscfor Single-Cell Expression Data

📅 2024-03-08

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This study addresses the statistical challenge of high-dimensional two-sample mean comparison in single-cell transcriptomic data. Methodologically, we propose a novel differential detection framework that integrates dimensionality reduction with debiasing: we introduce the “projected null hypothesis” and an “anchored projection” strategy to adaptively capture local signal structures via low-dimensional projection, coupled with semiparametric double machine learning (DML) for unbiased and efficient inference. The framework rigorously controls Type I error under the null while substantially improving statistical power for sparse and localized differential expression patterns. Moreover, it enables interpretable localization of differentially expressed gene regions. In extensive simulations and real single-cell datasets, our method outperforms existing approaches in both statistical power and biological interpretability.

Technology Category

Application Category

📝 Abstract

We study several variants of the high-dimensional mean inference problem motivated by modern single-cell genomics data. By taking advantage of low-dimensional and localized signal structures commonly seen in such data, our proposed methods not only have the usual frequentist validity but also provide useful information on the potential locations of the signal if the null hypothesis is rejected. Our method adaptively projects the high-dimensional vector onto a low-dimensional space, followed by a debiasing step using the semiparametric double-machine learning framework. Our analysis shows that debiasing is unnecessary under the global null, but necessary under a ``projected null'' that is of scientific interest. We also propose an ``anchored projection'' to maximize the power while avoiding the degeneracy issue under the null. Experiments on synthetic data and a real single-cell sequencing dataset demonstrate the effectiveness and interpretability of our methods.

Problem

Research questions and friction points this paper is trying to address.

Address high-dimensional two-sample mean comparisons in single-cell data

Solve double-dipping issue in projection and inference

Improve power against global null hypothesis in genomics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-adaptive projections for high-dimensional comparisons

Debiased estimator using double-machine learning

Flexible projection scheme for global null hypothesis

🔎 Similar Papers

No similar papers found.