Autoregressive Visual Decoding from EEG Signals

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses key challenges in EEG-to-image generation, including the large modality gap, complex multi-stage adaptation, error propagation, and reliance on computationally expensive diffusion models. The authors propose AVDE, a lightweight framework that uniquely integrates autoregressive generation with multi-scale image token prediction to hierarchically reconstruct visual content from EEG embeddings, aligning with the hierarchical nature of human visual perception. Built upon the pretrained EEG model LaBraM, AVDE employs contrastive learning to align EEG and image representations and leverages a VQ-VAE combined with a Transformer for efficient generation. Evaluated on two benchmark datasets, AVDE achieves superior performance in both image retrieval and reconstruction tasks while using only 10% of the parameters of state-of-the-art methods. Intermediate outputs further demonstrate the hierarchical plausibility and interpretability of the generation process.

Technology Category

Application Category

📝 Abstract
Electroencephalogram (EEG) signals have become a popular medium for decoding visual information due to their cost-effectiveness and high temporal resolution. However, current approaches face significant challenges in bridging the modality gap between EEG and image data. These methods typically rely on complex adaptation processes involving multiple stages, making it hard to maintain consistency and manage compounding errors. Furthermore, the computational overhead imposed by large-scale diffusion models limit their practicality in real-world brain-computer interface (BCI) applications. In this work, we present AVDE, a lightweight and efficient framework for visual decoding from EEG signals. First, we leverage LaBraM, a pre-trained EEG model, and fine-tune it via contrastive learning to align EEG and image representations. Second, we adopt an autoregressive generative framework based on a "next-scale prediction" strategy: images are encoded into multi-scale token maps using a pre-trained VQ-VAE, and a transformer is trained to autoregressively predict finer-scale tokens starting from EEG embeddings as the coarsest representation. This design enables coherent generation while preserving a direct connection between the input EEG signals and the reconstructed images. Experiments on two datasets show that AVDE outperforms previous state-of-the-art methods in both image retrieval and reconstruction tasks, while using only 10% of the parameters. In addition, visualization of intermediate outputs shows that the generative process of AVDE reflects the hierarchical nature of human visual perception. These results highlight the potential of autoregressive models as efficient and interpretable tools for practical BCI applications.
Problem

Research questions and friction points this paper is trying to address.

EEG
visual decoding
modality gap
brain-computer interface
autoregressive generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive decoding
EEG-to-image reconstruction
multi-scale token prediction
contrastive representation alignment
lightweight BCI framework
🔎 Similar Papers
No similar papers found.
S
Sicheng Dai
Institute of Automation, Chinese Academy of Sciences
H
Hongwang Xiao
National Key Laboratory for Multimedia Information Processing, Peking University
Shan Yu
Shan Yu
Institute of Automation, Chinese Academy of Sciences
Neuroscience
Qiwei Ye
Qiwei Ye
Beijing Academy of Artificial Intelligence
Scientific AIAI for ScienceFoundation Model