π€ AI Summary
Traditional voxel-wise encoding models are susceptible to fMRI noise, inter-subject variability, and spatial redundancy. This work proposes the first encoding framework that integrates independent component analysis (ICA) with large language models (LLMs): ICA is first applied to decompose fMRI data collected during natural story listening, and LLM-derived linguistic representations are then used to predict the time series of individual independent components. This approach enables cross-subject consistent and interpretable modeling of neural responses at the functional network level, while leveraging ICA-AROMA to separate neural signals from noise. The experiments identify multiple stable, highly predictable components across subjects, predominantly localized in auditory and language networks, whose temporal dynamics correlate strongly with acoustic features; in contrast, noise-related components exhibit low predictive performance, confirming the modelβs validity and specificity.
π Abstract
Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects.