DLink: Distilling Layer-wise and Dominant Knowledge from EEG Foundation Models

πŸ“… 2026-04-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

261K/year
πŸ€– AI Summary
This work addresses the challenge of deploying EEG foundation models in embedded brain-computer interfaces, where existing knowledge distillation approaches underperform due to their neglect of intermediate-layer semantics and frequency-domain structure. To overcome these limitations, we propose DLink, a novel distillation framework that dynamically aggregates critical intermediate representations from the teacher model via adaptive routing, employs a β€œmimic-then-compress” student architecture to avoid heavy classification heads, and introduces spectral distillation to align frequency-domain representations and preserve intrinsic EEG oscillatory characteristics. By integrating dynamic layer aggregation, structured spatiotemporal compression, and frequency-aware alignment, DLink significantly outperforms current distillation baselines across four EEG benchmarks, achieving performance close to fully fine-tuned foundation models while substantially reducing model parameters and inference overhead.

Technology Category

Application Category

πŸ“ Abstract
EEG foundation models (FMs) achieve strong cross-subject and cross-task generalization but impose substantial computational and memory costs that hinder deployment on embedded BCI systems. Knowledge distillation is a natural solution; however, conventional methods fail for EEG FMs because task-relevant semantics are often distributed across intermediate layers, and aggressive dimensionality reduction can distort oscillatory structure via representational collapse and aliasing. To address these challenges, we propose DLink (Distilling Layer-wise and Dominant Knowledge), a unified framework for transferring knowledge from large EEG FMs to compact students with three key innovations: (1) a dynamic Router that adaptively aggregates teacher layers to capture dominant intermediate representations; (2) an EEG MiC student with a Mimic-then-Compress pipeline, which inherits high-dimensional teacher features and then applies structured spatio-temporal compression to avoid a heavy classification head; and (3) spectral distillation that aligns teacher-student representations in the frequency domain to regularize compression and mitigate aliasing and temporal jitter. Experiments on four EEG benchmarks show that DLink enables compact students to outperform lightweight baselines while approaching fully fine-tuned FM performance at substantially lower model size and inference cost.
Problem

Research questions and friction points this paper is trying to address.

EEG foundation models
knowledge distillation
embedded BCI systems
representational collapse
spectral distortion
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation
EEG foundation models
dynamic routing
spectral distillation
structured compression