CLEAN-MI: A Scalable and Efficient Pipeline for Constructing High-Quality Neurodata in Motor Imagery Paradigm

📅 2025-06-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address key bottlenecks in motor imagery (MI) brain–computer interfaces—including low signal-to-noise ratio (SNR), heterogeneous electrode configurations, and substantial inter-subject variability across subjects and recording devices—this paper proposes the first end-to-end neural data cleaning pipeline for EEG. The pipeline integrates bandpass filtering, template-driven channel remapping, statistical-guided subject selection, and maximum mean discrepancy (MMD)-based adversarial marginal distribution alignment to achieve automated standardization and quality enhancement of multi-source EEG data. Evaluated on multiple public MI datasets, the method significantly improves SNR and class separability, yielding an average 5.2% gain in downstream classification accuracy and tripling data reuse efficiency. This work establishes a high-quality, standardized data foundation essential for developing robust and generalizable foundation models for MI-BCI systems.

Technology Category

Application Category

📝 Abstract
The construction of large-scale, high-quality datasets is a fundamental prerequisite for developing robust and generalizable foundation models in motor imagery (MI)-based brain-computer interfaces (BCIs). However, EEG signals collected from different subjects and devices are often plagued by low signal-to-noise ratio, heterogeneity in electrode configurations, and substantial inter-subject variability, posing significant challenges for effective model training. In this paper, we propose CLEAN-MI, a scalable and systematic data construction pipeline for constructing large-scale, efficient, and accurate neurodata in the MI paradigm. CLEAN-MI integrates frequency band filtering, channel template selection, subject screening, and marginal distribution alignment to systematically filter out irrelevant or low-quality data and standardize multi-source EEG datasets. We demonstrate the effectiveness of CLEAN-MI on multiple public MI datasets, achieving consistent improvements in data quality and classification performance.
Problem

Research questions and friction points this paper is trying to address.

Addresses low signal-to-noise ratio in EEG signals
Standardizes multi-source heterogeneous EEG datasets
Improves data quality for motor imagery BCIs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Frequency band filtering for noise reduction
Channel template selection for standardization
Marginal distribution alignment for consistency
🔎 Similar Papers
No similar papers found.
Dingkun Liu
Dingkun Liu
Tsinghua University
brain machine interfaceartificial intelligence
Z
Zhu Chen
Ministry of Education Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China
D
Dongrui Wu
Ministry of Education Key Laboratory of Image Processing and Intelligent Control, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan 430074, China; Zhongguancun Academy, Beijing, 100080 China