🤖 AI Summary
Traditional log-normal distributions inadequately model extreme values in non-negative, skewed, heavy-tailed data—such as insurance losses—due to excessively rapid tail decay.
Method: We propose a novel unsupervised mixture model that explicitly couples a log-normal component (to capture the bulk distribution) with a Pareto-type tail (to model heavy tails) within an EM framework, ensuring both interpretability and computational tractability. Maximum likelihood estimation is performed via a stable, accelerated EM algorithm designed to mitigate numerical instability and slow convergence.
Results: Extensive Monte Carlo simulations and empirical validation on real automobile insurance claim data demonstrate that our model achieves goodness-of-fit comparable to state-of-the-art heavy-tailed distributions, while yielding more stable parameter estimates, faster convergence, and significantly reduced computational complexity. The approach substantially enhances practicality and robustness in modeling thick-tailed risk.
📝 Abstract
We develop an unsupervised mixture model for non-negative, skewed and heavy-tailed data, such as losses in actuarial and risk management applications. The mixture has a lognormal component, which is usually appropriate for the body of the distribution, and a Pareto-type tail, aimed at accommodating the largest observations, since the lognormal tail often decays too fast. We show that maximum likelihood estimation can be performed by means of the EM algorithm and that the model is quite flexible in fitting data from different data-generating processes. Simulation experiments and a real-data application to automobiles claims suggest that the approach is equivalent in terms of goodness-of-fit, but easier to estimate, with respect to two existing distributions with similar features.