Spectral Coherence Index: A Model-Free Metric for Protein Structural Ensemble Quality Assessment

๐Ÿ“… 2026-03-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Distinguishing whether conformational variability in NMR-derived protein structural ensembles arises from concerted motions or noise artifacts remains challenging. This work proposes a model-free, rotationally invariant Spectral Coherence Index (SCI), which quantifies the intrinsic consistency of an ensemble based on the effective rank of the inter-structure distance variance matrix. SCI represents the first interpretable and bounded summary metric for ensemble coherence, enabling quality control of heterogeneous protein ensembles. Evaluated on the Main110 dataset and an independent test set, SCI achieves AUCs of 0.973 and 0.983, respectively, using participation ratioโ€“based effective rank, ROC analysis, grouped cross-validation, and correlation assessments with GNM flexibility and residue-level RMSF. The per-residue contributions of SCI show strong agreement with experimental RMSF, confirming its validity and generalizability.
๐Ÿ“ Abstract
Protein structural ensembles from NMR spectroscopy capture biologically important conformational heterogeneity, but it remains difficult to determine whether observed variation reflects coordinated motion or noise-like artifacts. We evaluate the Spectral Coherence Index (SCI), a model-free, rotation-invariant summary derived from the participation-ratio effective rank of the inter-model pairwise distance-variance matrix. Under grouped primary analysis of a Main110 cohort of 110 NMR ensembles (30--403 residues; 10--30 models per entry), SCI separated experimental ensembles from matched synthetic incoherent controls with AUC-ROC $= 0.973$ and Cliff's $ฮด= -0.945$. Relative to an internal 27-protein pilot, discrimination softened modestly, showing that pilot-era thresholds do not transfer perfectly to a larger, more heterogeneous cohort: the primary operating point $ฯ„= 0.811$ yielded 95.5\% sensitivity and 89.1\% specificity. PDB-level sensitivity remained nearly unchanged (AUC $= 0.972$), and an independent 11-protein holdout reached AUC $= 0.983$. Across 5-fold grouped stratified cross-validation and leave-one-function-class-out testing, SCI remained strong (AUC $= 0.968$ and $0.971$), although $ฯƒ_{R_g}$ was the stronger single-feature discriminator and a QC-augmented multifeature model generalized best (AUC $= 0.989$ and $0.990$). Residue-level validation linked SCI-derived contributions to experimental RMSF across 110 proteins and showed broad concordance with GNM-based flexibility patterns. Rescue analyses showed that Main110 softening arose mainly from size and ensemble normalization rather than from loss of spectral signal. Together, these results establish SCI as an interpretable, bounded coherence summary that is most useful when embedded in a multimetric QC workflow for heterogeneous protein ensembles.
Problem

Research questions and friction points this paper is trying to address.

protein structural ensemble
NMR spectroscopy
conformational heterogeneity
quality assessment
coordinated motion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Coherence Index
model-free metric
protein structural ensemble
NMR quality assessment
rotation-invariant
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yuda Bi
Tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, GA 30303 USA
Huaiwen Zhang
Huaiwen Zhang
Northeastern University
J
Jingnan Sun
Departments of Biomedical Engineering, Johns Hopkins University, Baltimore, MD 21287 USA
Vince D. Calhoun
Vince D. Calhoun
Director-Translational Research in Neuroimaging and Data Science (TReNDS;GSU/GAtech/Emory)
brain imaging/MRI/EEG/MEGdata fusiondata scienceimage analysismental illness