🤖 AI Summary
Cross-subject fMRI-to-image decoding suffers from representational bias due to inter-individual cognitive variability, and unidirectional mapping in diffusion models exacerbates error accumulation, severely limiting reconstruction fidelity. To address this, we propose a bidirectional autoencoding framework that jointly incorporates a subject-bias modulation module to achieve unified semantic representation across subjects. We further introduce a semantic refinement and visual consistency module to suppress cross-subject error propagation. Our method integrates ControlNet with Stable Diffusion, enabling end-to-end optimization of the decoding–generation pipeline. Experiments demonstrate significant improvements over state-of-the-art methods in both qualitative and quantitative evaluations: reconstructed images exhibit substantially enhanced semantic fidelity and visual quality. Moreover, the model achieves rapid adaptation to new subjects using only a few fMRI samples, markedly improving generalizability and practical applicability.
📝 Abstract
Decoding stimulus images from fMRI signals has advanced with pre-trained generative models. However, existing methods struggle with cross-subject mappings due to cognitive variability and subject-specific differences. This challenge arises from sequential errors, where unidirectional mappings generate partially inaccurate representations that, when fed into diffusion models, accumulate errors and degrade reconstruction fidelity. To address this, we propose the Bidirectional Autoencoder Intertwining framework for accurate decoded representation prediction. Our approach unifies multiple subjects through a Subject Bias Modulation Module while leveraging bidirectional mapping to better capture data distributions for precise representation prediction. To further enhance fidelity when decoding representations into stimulus images, we introduce a Semantic Refinement Module to improve semantic representations and a Visual Coherence Module to mitigate the effects of inaccurate visual representations. Integrated with ControlNet and Stable Diffusion, our method outperforms state-of-the-art approaches on benchmark datasets in both qualitative and quantitative evaluations. Moreover, our framework exhibits strong adaptability to new subjects with minimal training samples.