Regressor-guided Diffusion Model for De Novo Peptide Sequencing with Explicit Mass Control

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in current deep learning approaches to de novo peptide sequencing: the frequent neglect of consistency between predicted peptides and the experimentally observed precursor mass, which often yields physically implausible results. To overcome this, we propose DiffuNovo, the first method to apply diffusion models to this task. Our approach explicitly incorporates mass constraints through a dual mechanism—introducing a peptide-level mass loss during training and guiding gradient updates in the latent space during inference via a regression-based guidance module. This explicit embedding of physical priors significantly enhances both the plausibility and accuracy of predicted sequences. Evaluated on standard benchmarks, DiffuNovo outperforms state-of-the-art methods and achieves a substantial reduction in mass prediction error.

Technology Category

Application Category

📝 Abstract
The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint, that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.
Problem

Research questions and friction points this paper is trying to address.

de novo peptide sequencing
mass consistency
precursor mass constraint
peptide mass prediction
mass spectrometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
de novo peptide sequencing
mass constraint
regressor-guided generation
latent space guidance
S
Shaorong Chen
Zhejiang University, Hangzhou, China, 310058; AI Lab, Westlake University, Hangzhou, China, 310030
Jingbo Zhou
Jingbo Zhou
Westlake University & Zhejiang University
AI4Science,LLMs
J
Jun Xia
AIMS Lab, The Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China, 511453; The Hong Kong University of Science and Technology, Hong Kong, China, 999077