Squared families: Searching beyond regular probability models

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This paper addresses the limited expressive power of conventional probabilistic models by proposing and systematically studying the “square family”—a novel density family constructed by squaring linear transformations of sufficient statistics. To mitigate its inherent singularity, the authors develop a regularization framework, revealing for the first time the conformal structure shared by its Fisher information and Hessian metric; they prove that only even-degree monomial statistics admit parameter-integral decomposition, and discover that a single kernel integral unifies computation of the normalization constant, Bregman divergence, and Fisher information. Theoretically, they establish robust parameter and density estimation theory under both misspecified and well-specified settings, achieving a density learning rate of $O(N^{-1/2}) + C n^{-1/4}$. Practically, they provide a computationally efficient integration paradigm surpassing exponential families, combining favorable geometric properties with tractability.

Technology Category

Application Category

📝 Abstract

We introduce squared families, which are families of probability densities obtained by squaring a linear transformation of a statistic. Squared families are singular, however their singularity can easily be handled so that they form regular models. After handling the singularity, squared families possess many convenient properties. Their Fisher information is a conformal transformation of the Hessian metric induced from a Bregman generator. The Bregman generator is the normalising constant, and yields a statistical divergence on the family. The normalising constant admits a helpful parameter-integral factorisation, meaning that only one parameter-independent integral needs to be computed for all normalising constants in the family, unlike in exponential families. Finally, the squared family kernel is the only integral that needs to be computed for the Fisher information, statistical divergence and normalising constant. We then describe how squared families are special in the broader class of $g$-families, which are obtained by applying a sufficiently regular function $g$ to a linear transformation of a statistic. After removing special singularities, positively homogeneous families and exponential families are the only $g$-families for which the Fisher information is a conformal transformation of the Hessian metric, where the generator depends on the parameter only through the normalising constant. Even-order monomial families also admit parameter-integral factorisations, unlike exponential families. We study parameter estimation and density estimation in squared families, in the well-specified and misspecified settings. We use a universal approximation property to show that squared families can learn sufficiently well-behaved target densities at a rate of $mathcal{O}(N^{-1/2})+C n^{-1/4}$, where $N$ is the number of datapoints, $n$ is the number of parameters, and $C$ is some constant.

Problem

Research questions and friction points this paper is trying to address.

Handling singularities in squared families of probability densities

Exploring properties of Fisher information in squared families

Studying parameter and density estimation in squared families

Innovation

Methods, ideas, or system contributions that make the work stand out.

Squared families handle singularities for regular models

Fisher information is conformal Hessian transformation

Parameter-integral factorization simplifies normalizing constants

🔎 Similar Papers

Towards One Model for Classical Dimensionality Reduction: A Probabilistic Perspective on UMAP and t-SNE