CATVis: Context-Aware Thought Visualization

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of high-fidelity visual representation decoding and image generation from noisy, low signal-to-noise ratio (SNR) electroencephalography (EEG) signals. We propose a five-stage context-aware EEG-to-Image generation framework: (1) time-frequency feature extraction via a dedicated EEG encoder; (2) cross-modal alignment into the CLIP visual semantic space; (3) caption re-ranking and weighted interpolation to enhance semantic consistency; (4) diffusion-based image synthesis driven by aligned embeddings; and (5) end-to-end optimization integrating concept classification, caption reconstruction, and diffusion priors. Our key contribution is the joint optimization of these objectives to achieve direct EEG-to-image alignment. Experiments demonstrate significant improvements over state-of-the-art EEG decoding methods: +13.43% in classification accuracy, +15.21% in generation accuracy, and −36.61% in Fréchet Inception Distance (FID).

Technology Category

Application Category

📝 Abstract
EEG-based brain-computer interfaces (BCIs) have shown promise in various applications, such as motor imagery and cognitive state monitoring. However, decoding visual representations from EEG signals remains a significant challenge due to their complex and noisy nature. We thus propose a novel 5-stage framework for decoding visual representations from EEG signals: (1) an EEG encoder for concept classification, (2) cross-modal alignment of EEG and text embeddings in CLIP feature space, (3) caption refinement via re-ranking, (4) weighted interpolation of concept and caption embeddings for richer semantics, and (5) image generation using a pre-trained Stable Diffusion model. We enable context-aware EEG-to-image generation through cross-modal alignment and re-ranking. Experimental results demonstrate that our method generates high-quality images aligned with visual stimuli, outperforming SOTA approaches by 13.43% in Classification Accuracy, 15.21% in Generation Accuracy and reducing Fréchet Inception Distance by 36.61%, indicating superior semantic alignment and image quality.
Problem

Research questions and friction points this paper is trying to address.

Decoding visual representations from noisy EEG signals
Improving EEG-to-image generation with semantic alignment
Enhancing image quality and accuracy in BCIs
Innovation

Methods, ideas, or system contributions that make the work stand out.

EEG encoder for concept classification
Cross-modal alignment in CLIP space
Stable Diffusion for image generation
🔎 Similar Papers
No similar papers found.
T
Tariq Mehmood
Lahore University of Management Sciences, Lahore, Pakistan
H
Hamza Ahmad
Forman Christian College, University, Lahore, Pakistan
M
Muhammad Haroon Shakeel
Arbisoft, Pakistan
Murtaza Taj
Murtaza Taj
Associate Professor of Computer Science, LUMS School of Science & Engineering
Computer ScienceGraphicsImage ProcessingComputer Vision