BrainVista: Modeling Naturalistic Brain Dynamics as Multimodal Next-Token Prediction

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of causal forward simulation in naturalistic brain dynamics, which is hindered by temporal scale mismatches between multimodal inputs and the complex topological structure of cortical networks. To overcome this limitation, we propose BrainVista, a novel framework that models brain state evolution through multimodal autoregressive learning. BrainVista introduces a network-level tokenizer to disentangle system-specific dynamics, employs spatial mixture heads to capture cross-network information flow, and incorporates a Stimulus-to-Brain (S2B) masking mechanism to align high-frequency sensory stimuli with hemodynamic responses. Evaluated on the Algonauts 2025, CineBrain, and HAD datasets, our approach achieves state-of-the-art performance, improving pattern correlation in long-horizon predictions by 36.0% and 33.3%, respectively.

Technology Category

Application Category

📝 Abstract
Naturalistic fMRI characterizes the brain as a dynamic predictive engine driven by continuous sensory streams. However, modeling the causal forward evolution in realistic neural simulation is impeded by the timescale mismatch between multimodal inputs and the complex topology of cortical networks. To address these challenges, we introduce BrainVista, a multimodal autoregressive framework designed to model the causal evolution of brain states. BrainVista incorporates Network-wise Tokenizers to disentangle system-specific dynamics and a Spatial Mixer Head that captures inter-network information flow without compromising functional boundaries. Furthermore, we propose a novel Stimulus-to-Brain (S2B) masking mechanism to synchronize high-frequency sensory stimuli with hemodynamically filtered signals, enabling strict, history-only causal conditioning. We validate our framework on Algonauts 2025, CineBrain, and HAD, achieving state-of-the-art fMRI encoding performance. In long-horizon rollout settings, our model yields substantial improvements over baselines, increasing pattern correlation by 36.0\% and 33.3\% on relative to the strongest baseline Algonauts 2025 and CineBrain, respectively.
Problem

Research questions and friction points this paper is trying to address.

naturalistic fMRI
brain dynamics
causal modeling
multimodal inputs
cortical networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal autoregressive modeling
Network-wise Tokenizers
Spatial Mixer Head
Stimulus-to-Brain masking
causal brain dynamics
🔎 Similar Papers
No similar papers found.