DIA-CLIP: a universal representation learning framework for zero-shot DIA proteomics

📅 2026-02-02
🏛️ bioRxiv
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current DIA mass spectrometry analysis methods rely on within-run semi-supervised rescoring, which is prone to overfitting and exhibits limited generalizability. This work proposes DIA-CLIP, the first approach to introduce universal cross-modal representation learning into DIA proteomics. By leveraging a dual-encoder contrastive learning framework combined with an encoder–decoder architecture, DIA-CLIP constructs a unified embedding space for peptides and mass spectra, enabling zero-shot, high-accuracy peptide–spectrum matching without requiring within-run training. The method significantly outperforms existing tools across multiple benchmarks, achieving up to a 45% increase in protein identifications and a 12% reduction in decoy rates. Furthermore, DIA-CLIP demonstrates strong applicability to emerging frontiers such as single-cell and spatial proteomics.

Technology Category

Application Category

📝 Abstract
Data-independent acquisition mass spectrometry (DIA-MS) has established itself as a cornerstone of proteomic profiling and large-scale systems biology, offering unparalleled depth and reproducibility. Current DIA analysis frameworks, however, require semi-supervised training within each run for peptide-spectrum match (PSM) re-scoring. This approach is prone to overfitting and lacks generalizability across diverse species and experimental conditions. Here, we present DIA-CLIP, a pre-trained model shifting the DIA analysis paradigm from semi-supervised training to universal cross-modal representation learning. By integrating dual-encoder contrastive learning framework with encoder-decoder architecture, DIA-CLIP establishes a unified cross-modal representation for peptides and corresponding spectral features, achieving high-precision, zero-shot PSM inference. Extensive evaluations across diverse benchmarks demonstrate that DIA-CLIP consistently outperforms state-of-the-art tools, yielding up to a 45% increase in protein identification while achieving a 12% reduction in entrapment identifications. Moreover, DIA-CLIP holds immense potential for diverse practical applications, such as single-cell and spatial proteomics, where its enhanced identification depth facilitates the discovery of novel biomarkers and the elucidates of intricate cellular mechanisms.
Problem

Research questions and friction points this paper is trying to address.

DIA-MS
peptide-spectrum match
zero-shot
generalizability
overfitting
Innovation

Methods, ideas, or system contributions that make the work stand out.

zero-shot learning
contrastive learning
cross-modal representation
DIA-MS
universal representation learning
🔎 Similar Papers
No similar papers found.
Y
Yucheng Liao
Center for Machine Learning Research, Peking University, Beijing, China 100871; AI for Science Institute, Beijing, China 100080; State Key Laboratory of Medical Proteomics, Beijing, China 102206
H
Han Wen
AI for Science Institute, Beijing, China 100080; State Key Laboratory of Medical Proteomics, Beijing, China 102206
Weinan E
Weinan E
Professor of Mathematics, Princeton University
applied mathematics
Weijie Zhang
Weijie Zhang
University of Kansas Medical Center
Inverse planningparticle therapy