Robust mean change point testing in high-dimensional data with heavy tails

📅 2023-05-30

📈 Citations: 2

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This paper addresses the statistical detection of mean change-points in heavy-tailed high-dimensional data, covering both exponential and polynomial tail decay, as well as sparse and dense change-point regimes. Methodologically, it introduces novel CUSUM-type and median-mean hybrid test statistics, integrating ℓ₀-sparse analysis with extreme value theory. Theoretically, it establishes the first dense–sparse phase transition framework for high-dimensional change-point detection under heavy tails, precisely characterizing information-theoretic boundaries across four tail–sparsity combinations. It proves that, for tail index α ∈ [2, 4), no information-theoretic advantage exists for sparse change-point detection. The proposed procedures achieve near-optimal detection rates—up to poly-logarithmic factors—under multiple change-points, temporal dependence, and finite low-order moments. Furthermore, the work quantifies the asymptotic cost imposed by heavy tails on detection difficulty.

📝 Abstract

We study mean change point testing problems for high-dimensional data, with exponentially- or polynomially-decaying tails. In each case, depending on the $ell_0$-norm of the mean change vector, we separately consider dense and sparse regimes. We characterise the boundary between the dense and sparse regimes under the above two tail conditions for the first time in the change point literature and propose novel testing procedures that attain optimal rates in each of the four regimes up to a poly-iterated logarithmic factor. By comparing with previous results under Gaussian assumptions, our results quantify the costs of heavy-tailedness on the fundamental difficulty of change point testing problems for high-dimensional data. To be specific, when the error distributions possess exponentially-decaying tails, a CUSUM-type statistic is shown to achieve a minimax testing rate up to $sqrt{loglog(8n)}$. As for polynomially-decaying tails, admitting bounded $alpha$-th moments for some $alpha geq 4$, we introduce a median-of-means-type test statistic that achieves a near-optimal testing rate in both dense and sparse regimes. In the sparse regime, we further propose a computationally-efficient test to achieve optimality. Our investigation in the even more challenging case of $2 leq alpha<4$, unveils a new phenomenon that the minimax testing rate has no sparse regime, i.e. testing sparse changes is information-theoretically as hard as testing dense changes. Finally, we consider various extensions where we also obtain near-optimal performances, including testing against multiple change points, allowing temporal dependence as well as fewer than two finite moments in the data generating mechanisms. We also show how sub-Gaussian rates can be achieved when an additional minimal spacing condition is imposed under the alternative hypothesis.

Problem

Research questions and friction points this paper is trying to address.

Testing mean change points in high-dimensional heavy-tailed data

Characterizing dense and sparse regime boundaries under tail decay

Developing optimal tests for exponential and polynomial tail distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

CUSUM-type statistic for exponentially-decaying tail data

Median-of-means test for polynomially-decaying tail distributions

Computationally-efficient sparse regime test achieving optimal performance

🔎 Similar Papers

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions