🤖 AI Summary
This work addresses statistical and optimization challenges arising from discontinuities and non-differentiability of normalization functions in modeling group invariance—particularly permutation invariance—by establishing a theory of differentiable approximations. We propose a concise construction based on sorting and symmetric averaging, which for the first time systematically resolves permutation-invariant statistical testing in high dimensions. A low-dimensional equivalent embedding is developed to achieve sample-efficient invariant testing. We derive dimension-free optimal convergence rates for density estimation under permutation invariance. Furthermore, we prove that the metric entropy of the class of permutation-invariant functions decays exponentially, substantially reducing model complexity. Finally, the computational complexity of embedding construction is reduced from $O(n!)$ to $O(n log n)$. Collectively, these results provide both a rigorous theoretical foundation and an efficient algorithmic framework for high-dimensional invariant learning.
📝 Abstract
Permutation invariance is among the most common symmetry that can be exploited to simplify complex problems in machine learning (ML). There has been a tremendous surge of research activities in building permutation invariant ML architectures. However, less attention is given to: (1) how to statistically test for permutation invariance of coordinates in a random vector where the dimension is allowed to grow with the sample size; (2) how to leverage permutation invariance in estimation problems and how does it help reduce dimensions. In this paper, we take a step back and examine these questions in several fundamental problems: (i) testing the assumption of permutation invariance of multivariate distributions; (ii) estimating permutation invariant densities; (iii) analyzing the metric entropy of permutation invariant function classes and compare them with their counterparts without imposing permutation invariance; (iv) deriving an embedding of permutation invariant reproducing kernel Hilbert spaces for efficient computation. In particular, our methods for (i) and (iv) are based on a sorting trick and (ii) is based on an averaging trick. These tricks substantially simplify the exploitation of permutation invariance.