Modeling and estimating skewed and heavy-tailed populations via unsupervised mixture models

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Traditional log-normal distributions inadequately model extreme values in non-negative, skewed, heavy-tailed data—such as insurance losses—due to excessively rapid tail decay. Method: We propose a novel unsupervised mixture model that explicitly couples a log-normal component (to capture the bulk distribution) with a Pareto-type tail (to model heavy tails) within an EM framework, ensuring both interpretability and computational tractability. Maximum likelihood estimation is performed via a stable, accelerated EM algorithm designed to mitigate numerical instability and slow convergence. Results: Extensive Monte Carlo simulations and empirical validation on real automobile insurance claim data demonstrate that our model achieves goodness-of-fit comparable to state-of-the-art heavy-tailed distributions, while yielding more stable parameter estimates, faster convergence, and significantly reduced computational complexity. The approach substantially enhances practicality and robustness in modeling thick-tailed risk.

Technology Category

Application Category

📝 Abstract

We develop an unsupervised mixture model for non-negative, skewed and heavy-tailed data, such as losses in actuarial and risk management applications. The mixture has a lognormal component, which is usually appropriate for the body of the distribution, and a Pareto-type tail, aimed at accommodating the largest observations, since the lognormal tail often decays too fast. We show that maximum likelihood estimation can be performed by means of the EM algorithm and that the model is quite flexible in fitting data from different data-generating processes. Simulation experiments and a real-data application to automobiles claims suggest that the approach is equivalent in terms of goodness-of-fit, but easier to estimate, with respect to two existing distributions with similar features.

Problem

Research questions and friction points this paper is trying to address.

Modeling skewed heavy-tailed data unsupervised

Combining lognormal body Pareto tail distributions

Improving estimation ease goodness-of-fit comparisons

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised mixture model for skewed data

Lognormal and Pareto tail components

EM algorithm for maximum likelihood estimation

🔎 Similar Papers

No similar papers found.