Modeling and estimating skewed and heavy-tailed populations via unsupervised mixture models

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional log-normal distributions inadequately model extreme values in non-negative, skewed, heavy-tailed data—such as insurance losses—due to excessively rapid tail decay. Method: We propose a novel unsupervised mixture model that explicitly couples a log-normal component (to capture the bulk distribution) with a Pareto-type tail (to model heavy tails) within an EM framework, ensuring both interpretability and computational tractability. Maximum likelihood estimation is performed via a stable, accelerated EM algorithm designed to mitigate numerical instability and slow convergence. Results: Extensive Monte Carlo simulations and empirical validation on real automobile insurance claim data demonstrate that our model achieves goodness-of-fit comparable to state-of-the-art heavy-tailed distributions, while yielding more stable parameter estimates, faster convergence, and significantly reduced computational complexity. The approach substantially enhances practicality and robustness in modeling thick-tailed risk.

Technology Category

Application Category

📝 Abstract
We develop an unsupervised mixture model for non-negative, skewed and heavy-tailed data, such as losses in actuarial and risk management applications. The mixture has a lognormal component, which is usually appropriate for the body of the distribution, and a Pareto-type tail, aimed at accommodating the largest observations, since the lognormal tail often decays too fast. We show that maximum likelihood estimation can be performed by means of the EM algorithm and that the model is quite flexible in fitting data from different data-generating processes. Simulation experiments and a real-data application to automobiles claims suggest that the approach is equivalent in terms of goodness-of-fit, but easier to estimate, with respect to two existing distributions with similar features.
Problem

Research questions and friction points this paper is trying to address.

Modeling skewed heavy-tailed data unsupervised
Combining lognormal body Pareto tail distributions
Improving estimation ease goodness-of-fit comparisons
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised mixture model for skewed data
Lognormal and Pareto tail components
EM algorithm for maximum likelihood estimation
🔎 Similar Papers
No similar papers found.
Marco Bee
Marco Bee
Professore di statistica economica, Università di Trento
F
Flavio Santi
Department of Economics and Management, University of Trento - Italy