🤖 AI Summary
This work addresses the challenges of large inter-subject variability, scarce labeled data, and limited interpretability in EEG-based emotion recognition by proposing the first multimodal large language model framework tailored for EEG signals. The approach integrates a pretrained EEG encoder, the Qwen large language model, and a learnable projection layer, leveraging emotion-discriminative pretraining, cross-modal alignment, and instruction tuning. Notably, it introduces a chain-of-thought reasoning mechanism—the first of its kind in this domain—to enhance model transparency and decision logic. Evaluated on a seven-class emotion dataset, the model achieves state-of-the-art classification performance and demonstrates superior generalization capabilities under zero-shot settings and complex scenarios, while offering improved interpretability through its reasoning process.
📝 Abstract
Emotion recognition from electroencephalography (EEG) signals remains challenging due to high inter-subject variability, limited labeled data, and the lack of interpretable reasoning in existing approaches. While recent multimodal large language models (MLLMs) have advanced emotion analysis, they have not been adapted to handle the unique spatiotemporal characteristics of neural signals. We present E^2-LLM (EEG-to-Emotion Large Language Model), the first MLLM framework for interpretable emotion analysis from EEG. E^2-LLM integrates a pretrained EEG encoder with Qwen-based LLMs through learnable projection layers, employing a multi-stage training pipeline that encompasses emotion-discriminative pretraining, cross-modal alignment, and instruction tuning with chain-of-thought reasoning. We design a comprehensive evaluation protocol covering basic emotion prediction, multi-task reasoning, and zero-shot scenario understanding. Experiments on the dataset across seven emotion categories demonstrate that E^2-LLM achieves excellent performance on emotion classification, with larger variants showing enhanced reliability and superior zero-shot generalization to complex reasoning scenarios. Our work establishes a new paradigm combining physiological signals with LLM reasoning capabilities, showing that model scaling improves both recognition accuracy and interpretable emotional understanding in affective computing.