Estimation of multiple mean vectors in high dimension

📅 2024-03-22
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the joint estimation of mean vectors for multiple probability distributions in high-dimensional spaces. For independent samples, we propose a data-dependent convex combination estimator that synergistically integrates neighborhood variance testing with upper-confidence-bound (UCB)-driven risk minimization to learn optimal weights—marking the first unified incorporation of these two principles. We establish theoretical guarantees showing that the estimator achieves asymptotic oracle optimality under high-dimensional asymptotics, substantially reducing quadratic risk and approaching the minimax performance boundary. Empirical evaluations demonstrate consistent superiority over independent empirical mean estimators on both synthetic and real-world tasks—including kernel mean embedding—with particularly pronounced gains in settings characterized by high effective dimensionality.

Technology Category

Application Category

📝 Abstract
We endeavour to estimate numerous multi-dimensional means of various probability distributions on a common space based on independent samples. Our approach involves forming estimators through convex combinations of empirical means derived from these samples. We introduce two strategies to find appropriate data-dependent convex combination weights: a first one employing a testing procedure to identify neighbouring means with low variance, which results in a closed-form plug-in formula for the weights, and a second one determining weights via minimization of an upper confidence bound on the quadratic risk.Through theoretical analysis, we evaluate the improvement in quadratic risk offered by our methods compared to the empirical means. Our analysis focuses on a dimensional asymptotics perspective, showing that our methods asymptotically approach an oracle (minimax) improvement as the effective dimension of the data increases.We demonstrate the efficacy of our methods in estimating multiple kernel mean embeddings through experiments on both simulated and real-world datasets.
Problem

Research questions and friction points this paper is trying to address.

Estimate multiple high-dimensional mean vectors from independent samples.
Develop convex combination estimators using empirical means and data-dependent weights.
Evaluate methods' quadratic risk improvement over empirical means asymptotically.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Convex combination of empirical means
Testing procedure for low variance weights
Minimization of quadratic risk upper bound