A Statistical Approach for Synthetic EEG Data Generation

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

264K/year

🤖 AI Summary

In mental health research, acquiring authentic electroencephalography (EEG) data is costly, privacy-sensitive, and often yields low-fidelity emotional/psychological signal representations. To address this, we propose a synthetic EEG generation method based on explicit cross-band correlation modeling. Our approach is the first to explicitly model and constrain inter-band dependencies among the δ, θ, α, β, and γ frequency bands, thereby guiding structured random sampling to produce high-fidelity, privacy-preserving synthetic EEG data. Experimental results demonstrate statistical equivalence between synthetic and real EEG data across distributional properties, band-wise correlations (mean absolute error < 0.02), and classification indistinguishability (random forest discrimination accuracy ≈ 50%), confirmed via PERMANOVA (p > 0.05)—outperforming existing methods. Furthermore, synthetic data enhances downstream task performance while eliminating risks of raw data leakage.

Technology Category

Application Category

📝 Abstract

Electroencephalogram (EEG) data is crucial for diagnosing mental health conditions but is costly and time-consuming to collect at scale. Synthetic data generation offers a promising solution to augment datasets for machine learning applications. However, generating high-quality synthetic EEG that preserves emotional and mental health signals remains challenging. This study proposes a method combining correlation analysis and random sampling to generate realistic synthetic EEG data. We first analyze interdependencies between EEG frequency bands using correlation analysis. Guided by this structure, we generate synthetic samples via random sampling. Samples with high correlation to real data are retained and evaluated through distribution analysis and classification tasks. A Random Forest model trained to distinguish synthetic from real EEG performs at chance level, indicating high fidelity. The generated synthetic data closely match the statistical and structural properties of the original EEG, with similar correlation coefficients and no significant differences in PERMANOVA tests. This method provides a scalable, privacy-preserving approach for augmenting EEG datasets, enabling more efficient model training in mental health research.

Problem

Research questions and friction points this paper is trying to address.

Generating high-quality synthetic EEG data preserving emotional signals

Augmenting EEG datasets for machine learning applications efficiently

Ensuring synthetic EEG matches statistical properties of real data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines correlation analysis and random sampling

Generates synthetic EEG with high fidelity

Preserves statistical and structural properties

🔎 Similar Papers

Enhancing EEG Signal-Based Emotion Recognition with Synthetic Data: Diffusion Model Approach