DecoFuse: Decomposing and Fusing the"What","Where", and"How"for Brain-Inspired fMRI-to-Video Decoding

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This study addresses the cross-modal decoding problem from functional magnetic resonance imaging (fMRI) to video. Methodologically, it proposes the first biologically inspired, ventral–dorsal dual-pathway framework that jointly models semantic (“What”), spatial (“Where”), and dynamic motion (“How”) representations—mirroring canonical neuroanatomical principles. The architecture employs a disentangled–fused design: a multi-branch diffusion decoder integrates neural feature alignment via cross-modal projection and gated fusion, enabling explicit modeling of motion dynamics in fMRI-to-video synthesis for the first time, while establishing interpretable correspondences between decoding branches and ventral/dorsal streams. Experiments demonstrate state-of-the-art performance: 82.4% semantic classification accuracy, 70.6% spatial consistency, 0.212 cosine similarity for motion prediction, and 21.9% top-1 accuracy on 50-class video generation. Neural encoding analyses further corroborate the dual-stream hypothesis.

Technology Category

Application Category

📝 Abstract

Decoding visual experiences from brain activity is a significant challenge. Existing fMRI-to-video methods often focus on semantic content while overlooking spatial and motion information. However, these aspects are all essential and are processed through distinct pathways in the brain. Motivated by this, we propose DecoFuse, a novel brain-inspired framework for decoding videos from fMRI signals. It first decomposes the video into three components - semantic, spatial, and motion - then decodes each component separately before fusing them to reconstruct the video. This approach not only simplifies the complex task of video decoding by decomposing it into manageable sub-tasks, but also establishes a clearer connection between learned representations and their biological counterpart, as supported by ablation studies. Further, our experiments show significant improvements over previous state-of-the-art methods, achieving 82.4% accuracy for semantic classification, 70.6% accuracy in spatial consistency, a 0.212 cosine similarity for motion prediction, and 21.9% 50-way accuracy for video generation. Additionally, neural encoding analyses for semantic and spatial information align with the two-streams hypothesis, further validating the distinct roles of the ventral and dorsal pathways. Overall, DecoFuse provides a strong and biologically plausible framework for fMRI-to-video decoding. Project page: https://chongjg.github.io/DecoFuse/.

Problem

Research questions and friction points this paper is trying to address.

Decoding videos from fMRI signals using brain-inspired decomposition.

Improving semantic, spatial, and motion information in fMRI-to-video decoding.

Validating biological plausibility via ventral and dorsal pathway alignment.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes video into semantic, spatial, motion components

Decodes and fuses components for video reconstruction

Aligns with brain's two-streams hypothesis biologically

🔎 Similar Papers

Animate Your Thoughts: Decoupled Reconstruction of Dynamic Natural Vision from Slow Brain Activity

2024-05-06arXiv.orgCitations: 2

Reanimating Images using Neural Representations of Dynamic Stimuli

2024-06-04Citations: 1

Toyota Research Institute

Los Altos, CA

AI Research Scientist, Computer Vision - Facebook Video Intelligence