Robust mean change point testing in high-dimensional data with heavy tails

📅 2023-05-30
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the statistical detection of mean change-points in heavy-tailed high-dimensional data, covering both exponential and polynomial tail decay, as well as sparse and dense change-point regimes. Methodologically, it introduces novel CUSUM-type and median-mean hybrid test statistics, integrating ℓ₀-sparse analysis with extreme value theory. Theoretically, it establishes the first dense–sparse phase transition framework for high-dimensional change-point detection under heavy tails, precisely characterizing information-theoretic boundaries across four tail–sparsity combinations. It proves that, for tail index α ∈ [2, 4), no information-theoretic advantage exists for sparse change-point detection. The proposed procedures achieve near-optimal detection rates—up to poly-logarithmic factors—under multiple change-points, temporal dependence, and finite low-order moments. Furthermore, the work quantifies the asymptotic cost imposed by heavy tails on detection difficulty.
📝 Abstract
We study mean change point testing problems for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error distributions possess exponentially-decaying tails, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $sqrt{loglog(8n)}$. As for polynomially-decaying tails, admitting bounded $alpha$-th moments for some $alpha geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In the sparse regime, we further propose a computationally-efficient test to achieve optimality. Our investigation in the even more challenging case of $2 leq alpha<4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e. testing sparse changes is information-theoretically as hard as testing dense changes. Finally, we consider various extensions where we also obtain near-optimal performances, including testing against multiple change points, allowing temporal dependence as well as fewer than two finite moments in the data generating mechanisms. We also show how sub-Gaussian rates can be achieved when an additional minimal spacing condition is imposed under the alternative hypothesis.
Problem

Research questions and friction points this paper is trying to address.

Testing mean change points in high-dimensional heavy-tailed data
Characterizing dense and sparse regime boundaries under tail decay
Developing optimal tests for exponential and polynomial tail distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

CUSUM-type statistic for exponentially-decaying tail data
Median-of-means test for polynomially-decaying tail distributions
Computationally-efficient sparse regime test achieving optimal performance
Mengchu Li
Mengchu Li
School of Mathematics, University of Birmingham
Y
Yudong Chen
Department of Statistics, University of Warwick
Tengyao Wang
Tengyao Wang
Professor in Statistics at London School of Economics
statistical theory and methodology
Y
Yi Yu
Department of Statistics, University of Warwick