Computing High-dimensional Confidence Sets for Arbitrary Distributions

šŸ“… 2025-04-03
šŸ“ˆ Citations: 0
✨ Influential: 0
šŸ“„ PDF
šŸ¤– AI Summary
This paper addresses the construction of minimum-volume Ī“-covering confidence sets under arbitrary high-dimensional distributions, aiming to achieve uncertainty quantification and support estimation. We propose the first polynomial-time algorithm that outputs ellipsoidal confidence sets, achieving a volume competitive ratio of exp(ƕ(d²⁄³)), an exponential improvement over prior methods in terms of volume. Theoretically, we establish a fundamental separation between proper learning (outputting spheres) and improper learning (outputting ellipsoids): we prove that any spherical Ī“-covering proper learner must incur a volume lower bound of exp(ƕ(d¹⁻ᵒ⁽¹⁾)), thereby revealing the intrinsic advantage of improper learning. Our technical framework integrates VC-dimension-constrained competitive learning for concept classes, ellipsoidal approximation, high-dimensional geometric analysis, and complexity-theoretic lower-bound arguments.

Technology Category

Application Category

šŸ“ Abstract
We study the problem of learning a high-density region of an arbitrary distribution over $mathbb{R}^d$. Given a target coverage parameter $delta$, and sample access to an arbitrary distribution $D$, we want to output a confidence set $S subset mathbb{R}^d$ such that $S$ achieves $delta$ coverage of $D$, i.e., $mathbb{P}_{y sim D} left[ y in S ight] ge delta$, and the volume of $S$ is as small as possible. This is a central problem in high-dimensional statistics with applications in finding confidence sets, uncertainty quantification, and support estimation. In the most general setting, this problem is statistically intractable, so we restrict our attention to competing with sets from a concept class $C$ with bounded VC-dimension. An algorithm is competitive with class $C$ if, given samples from an arbitrary distribution $D$, it outputs in polynomial time a set that achieves $delta$ coverage of $D$, and whose volume is competitive with the smallest set in $C$ with the required coverage $delta$. This problem is computationally challenging even in the basic setting when $C$ is the set of all Euclidean balls. Existing algorithms based on coresets find in polynomial time a ball whose volume is $exp( ilde{O}( d/ log d))$-factor competitive with the volume of the best ball. Our main result is an algorithm that finds a confidence set whose volume is $exp( ilde{O}(d^{2/3}))$ factor competitive with the optimal ball having the desired coverage. The algorithm is improper (it outputs an ellipsoid). Combined with our computational intractability result for proper learning balls within an $exp( ilde{O}(d^{1-o(1)}))$ approximation factor in volume, our results provide an interesting separation between proper and (improper) learning of confidence sets.
Problem

Research questions and friction points this paper is trying to address.

Learning high-density regions in arbitrary distributions
Computing small-volume confidence sets in high dimensions
Comparing proper and improper learning of confidence sets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Computes high-density regions for arbitrary distributions
Uses improper learning to output ellipsoid confidence sets
Achieves competitive volume with optimal Euclidean balls
šŸ”Ž Similar Papers
No similar papers found.
C
Chao Gao
Department of Statistics, University of Chicago, Chicago, USA
Liren Shan
Liren Shan
Research Assistant Professor, TTIC
Approximation AlgorithmClusteringMachine Learning
V
Vaidehi Srinivas
Department of Computer Science, Northwestern University, Evanston, USA
A
Aravindan Vijayaraghavan
Department of Computer Science, Northwestern University, Evanston, USA