Defining and Benchmarking a Data-Centric Design Space for Brain Graph Construction

📅 2025-08-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Conventional fMRI-based brain graph construction relies on rigid, fixed pipelines that neglect data-driven design choices. Method: This paper systematically defines and evaluates a data-centric design space for brain graph construction, spanning three stages: time-series signal processing, topological extraction, and graph feature characterization. We introduce key data-centric strategies—including high-amplitude BOLD filtering, adaptive sparsification, and multi-view node/edge feature representation—and integrate enhanced correlation metrics, lagged dynamic modeling, connectivity harmonization, and multi-view feature fusion. Contribution/Results: Experiments on the HCP1200 and ABIDE datasets demonstrate substantial improvements in psychiatric disorder classification accuracy. Our results empirically validate that data construction critically determines downstream task performance, establishing a new paradigm for reproducible and interpretable brain network modeling.

Technology Category

Application Category

📝 Abstract
The construction of brain graphs from functional Magnetic Resonance Imaging (fMRI) data plays a crucial role in enabling graph machine learning for neuroimaging. However, current practices often rely on rigid pipelines that overlook critical data-centric choices in how brain graphs are constructed. In this work, we adopt a Data-Centric AI perspective and systematically define and benchmark a data-centric design space for brain graph construction, constrasting with primarily model-centric prior work. We organize this design space into three stages: temporal signal processing, topology extraction, and graph featurization. Our contributions lie less in novel components and more in evaluating how combinations of existing and modified techniques influence downstream performance. Specifically, we study high-amplitude BOLD signal filtering, sparsification and unification strategies for connectivity, alternative correlation metrics, and multi-view node and edge features, such as incorporating lagged dynamics. Experiments on the HCP1200 and ABIDE datasets show that thoughtful data-centric configurations consistently improve classification accuracy over standard pipelines. These findings highlight the critical role of upstream data decisions and underscore the importance of systematically exploring the data-centric design space for graph-based neuroimaging. Our code is available at https://github.com/GeQinwen/DataCentricBrainGraphs.
Problem

Research questions and friction points this paper is trying to address.

Systematically defining data-centric design for brain graph construction
Evaluating impact of existing techniques on downstream performance
Improving classification accuracy via thoughtful data configurations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-centric AI for brain graph construction
Three-stage design space organization
Improved classification via data-centric configurations
🔎 Similar Papers
No similar papers found.
Q
Qinwen Ge
Vanderbilt University, Nashville, TN, USA
R
Roza G. Bayrak
Vanderbilt University, Nashville, TN, USA
Anwar Said
Anwar Said
AI Research Scientist at Institute for Software Integrated Systems, Vanderbilt University
Social Network AnalysisGraph Machine LearningGraph Neural NetworksGen AIData Science
C
Catie Chang
Vanderbilt University, Nashville, TN, USA
Xenofon Koutsoukos
Xenofon Koutsoukos
Vanderbilt University
T
Tyler Derr
Vanderbilt University, Nashville, TN, USA