Modality-Aware and Anatomical Vector-Quantized Autoencoding for Multimodal Brain MRI

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work proposes NeuroQuant, a 3D vector-quantized autoencoder designed for multimodal brain MRI that overcomes the limitations of existing variational autoencoders, which are typically restricted to single-modality inputs and thus fail to exploit complementary information across modalities such as T1 and T2. NeuroQuant introduces a dual-stream encoder to explicitly disentangle anatomical structure from modality-specific appearance, employs a shared codebook with a factorized multi-axis attention mechanism to learn cross-modal shared representations, and leverages FiLM for high-fidelity decoding. Trained via a joint 2D/3D strategy, the model significantly outperforms current VAE approaches on two multimodal brain MRI datasets, achieving superior reconstruction quality and establishing a scalable foundation for downstream generative modeling and cross-modal analysis.

Technology Category

Application Category

📝 Abstract

Learning a robust Variational Autoencoder (VAE) is a fundamental step for many deep learning applications in medical image analysis, such as MRI synthesizes. Existing brain VAEs predominantly focus on single-modality data (i.e., T1-weighted MRI), overlooking the complementary diagnostic value of other modalities like T2-weighted MRIs. Here, we propose a modality-aware and anatomically grounded 3D vector-quantized VAE (VQ-VAE) for reconstructing multi-modal brain MRIs. Called NeuroQuant, it first learns a shared latent representation across modalities using factorized multi-axis attention, which can capture relationships between distant brain regions. It then employs a dual-stream 3D encoder that explicitly separates the encoding of modality-invariant anatomical structures from modality-dependent appearance. Next, the anatomical encoding is discretized using a shared codebook and combined with modality-specific appearance features via Feature-wise Linear Modulation (FiLM) during the decoding phase. This entire approach is trained using a joint 2D/3D strategy in order to account for the slice-based acquisition of 3D MRI data. Extensive experiments on two multi-modal brain MRI datasets demonstrate that NeuroQuant achieves superior reconstruction fidelity compared to existing VAEs, enabling a scalable foundation for downstream generative modeling and cross-modal brain image analysis.

Problem

Research questions and friction points this paper is trying to address.

multimodal brain MRI

variational autoencoder

modality-aware learning

anatomical representation

medical image reconstruction

Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-aware

anatomical vector quantization

dual-stream encoding