CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

To address key bottlenecks in motor imagery (MI) brain–computer interfaces—including low signal-to-noise ratio (SNR), heterogeneous electrode configurations, and substantial inter-subject variability across subjects and recording devices—this paper proposes the first end-to-end neural data cleaning pipeline for EEG. The pipeline integrates bandpass filtering, template-driven channel remapping, statistical-guided subject selection, and maximum mean discrepancy (MMD)-based adversarial marginal distribution alignment to achieve automated standardization and quality enhancement of multi-source EEG data. Evaluated on multiple public MI datasets, the method significantly improves SNR and class separability, yielding an average 5.2% gain in downstream classification accuracy and tripling data reuse efficiency. This work establishes a high-quality, standardized data foundation essential for developing robust and generalizable foundation models for MI-BCI systems.

Technology Category

Application Category

📝 Abstract

The construction of large-scale, high-quality datasets is a fundamental prerequisite for developing robust and generalizable foundation models in motor imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals collected from different subjects and devices are often plagued by low signal-to-noise ratio, heterogeneity in electrode configurations, and substantial inter-subject variability, posing significant challenges for effective model training. In this paper, we propose CLEAN-MI, a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band filtering, channel template selection, subject screening, and marginal distribution alignment to systematically filter out irrelevant or low-quality data and standardize multi-source EEG datasets. We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.

Problem

Research questions and friction points this paper is trying to address.

Addresses low signal-to-noise ratio in EEG signals

Standardizes multi-source heterogeneous EEG datasets

Improves data quality for motor imagery BCIs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency band filtering for noise reduction

Channel template selection for standardization

Marginal distribution alignment for consistency

🔎 Similar Papers

No similar papers found.