🤖 AI Summary
In mental health research, acquiring authentic electroencephalography (EEG) data is costly, privacy-sensitive, and often yields low-fidelity emotional/psychological signal representations. To address this, we propose a synthetic EEG generation method based on explicit cross-band correlation modeling. Our approach is the first to explicitly model and constrain inter-band dependencies among the δ, θ, α, β, and γ frequency bands, thereby guiding structured random sampling to produce high-fidelity, privacy-preserving synthetic EEG data. Experimental results demonstrate statistical equivalence between synthetic and real EEG data across distributional properties, band-wise correlations (mean absolute error < 0.02), and classification indistinguishability (random forest discrimination accuracy ≈ 50%), confirmed via PERMANOVA (p > 0.05)—outperforming existing methods. Furthermore, synthetic data enhances downstream task performance while eliminating risks of raw data leakage.
📝 Abstract
Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data. We first analyze interdependencies between EEG frequency bands using correlation analysis. Guided by this structure, we generate synthetic samples via random sampling. Samples with high correlation to real data are retained and evaluated through distribution analysis and classification tasks. A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity. The generated synthetic data closely match the statistical and structural properties of the original EEG, with similar correlation coefficients and no significant differences in PERMANOVA tests. This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.