🤖 AI Summary
Current neuroimaging methods can only reconstruct category-selective responses in individual brain regions (e.g., FFA’s face selectivity) but fail to model dynamic, cross-regional interactions during natural vision. Method: We propose NeuroVolve—a gradient-guided, programmable neural optimization framework that leverages pre-trained vision-language model embeddings as a semantic prior and uses fMRI signals as supervision to synthesize images. It jointly enforces multi-region co-activation, antagonism, and hierarchical constraints. Contribution/Results: NeuroVolve unifies low-level feature and high-level semantic control in stimulus design, enabling personalized neural probe construction and visualization of semantic evolution trajectories. Experiments successfully replicate canonical category selectivity and generate naturalistic scenes satisfying empirically grounded multi-region coupling and decorrelation constraints. This establishes a novel, interpretable, closed-loop paradigm for neural encoding research.
📝 Abstract
What visual information is encoded in individual brain regions, and how do distributed patterns combine to create their neural representations? Prior work has used generative models to replicate known category selectivity in isolated regions (e.g., faces in FFA), but these approaches offer limited insight into how regions interact during complex, naturalistic vision. We introduce NeuroVolve, a generative framework that provides brain-guided image synthesis via optimization of a neural objective function in the embedding space of a pretrained vision-language model. Images are generated under the guidance of a programmable neural objective, i.e., activating or deactivating single regions or multiple regions together. NeuroVolve is validated by recovering known selectivity for individual brain regions, while expanding to synthesize coherent scenes that satisfy complex, multi-region constraints. By tracking optimization steps, it reveals semantic trajectories through embedding space, unifying brain-guided image editing and preferred stimulus generation in a single process. We show that NeuroVolve can generate both low-level and semantic feature-specific stimuli for single ROIs, as well as stimuli aligned to curated neural objectives. These include co-activation and decorrelation between regions, exposing cooperative and antagonistic tuning relationships. Notably, the framework captures subject-specific preferences, supporting personalized brain-driven synthesis and offering interpretable constraints for mapping, analyzing, and probing neural representations of visual information.