🤖 AI Summary
Extreme events—such as stock market crashes, earthquakes, and pandemics—are rare, catastrophic, and exhibit system-wide propagation, leading to severe data scarcity that undermines data-driven modeling. To address this, we present the first systematic survey of synthetic data generation methods tailored to extreme events and propose the first dedicated generative framework for extremely rare events. We design a customized evaluation suite encompassing statistical fidelity, dependency preservation, visual plausibility, and task-oriented utility, rigorously analyzing metric validity under heavy-tailed distributions. Our framework unifies generative models (GANs, diffusion models, VAEs), large language models, statistical modeling, and targeted resampling strategies. We curate benchmark datasets across finance, meteorology, geoscience, and epidemiology, identifying underexplored domains—including behavioral finance, wildfire dynamics, and windstorm modeling—and distill key open challenges to advance the reliability and practicality of extreme-event modeling.
📝 Abstract
Extreme events, such as market crashes, natural disasters, and pandemics, are rare but catastrophic, often triggering cascading failures across interconnected systems. Accurate prediction and early warning can help minimize losses and improve preparedness. While data-driven methods offer powerful capabilities for extreme event modeling, they require abundant training data, yet extreme event data is inherently scarce, creating a fundamental challenge. Synthetic data generation has emerged as a powerful solution. However, existing surveys focus on general data with privacy preservation emphasis, rather than extreme events' unique performance requirements. This survey provides the first overview of synthetic data generation for extreme events. We systematically review generative modeling techniques and large language models, particularly those enhanced by statistical theory as well as specialized training and sampling mechanisms to capture heavy-tailed distributions. We summarize benchmark datasets and introduce a tailored evaluation framework covering statistical, dependence, visual, and task-oriented metrics. A central contribution is our in-depth analysis of each metric's applicability in extremeness and domain-specific adaptations, providing actionable guidance for model evaluation in extreme settings. We categorize key application domains and identify underexplored areas like behavioral finance, wildfires, earthquakes, windstorms, and infectious outbreaks. Finally, we outline open challenges, providing a structured foundation for advancing synthetic rare-event research.