🤖 AI Summary
This work addresses a critical limitation in current deep learning approaches to de novo peptide sequencing: the frequent neglect of consistency between predicted peptides and the experimentally observed precursor mass, which often yields physically implausible results. To overcome this, we propose DiffuNovo, the first method to apply diffusion models to this task. Our approach explicitly incorporates mass constraints through a dual mechanism—introducing a peptide-level mass loss during training and guiding gradient updates in the latent space during inference via a regression-based guidance module. This explicit embedding of physical priors significantly enhances both the plausibility and accuracy of predicted sequences. Evaluated on standard benchmarks, DiffuNovo outperforms state-of-the-art methods and achieves a substantial reduction in mass prediction error.
📝 Abstract
The discovery of novel proteins relies on sensitive protein identification, for which de novo peptide sequencing (DNPS) from mass spectra is a crucial approach. While deep learning has advanced DNPS, existing models inadequately enforce the fundamental mass consistency constraint, that a predicted peptide's mass must match the experimental measured precursor mass. Previous DNPS methods often treat this critical information as a simple input feature or use it in post-processing, leading to numerous implausible predictions that do not adhere to this fundamental physical property. To address this limitation, we introduce DiffuNovo, a novel regressor-guided diffusion model for de novo peptide sequencing that provides explicit peptide-level mass control. Our approach integrates the mass constraint at two critical stages: during training, a novel peptide-level mass loss guides model optimization, while at inference, regressor-based guidance from gradient-based updates in the latent space steers the generation to compel the predicted peptide adheres to the mass constraint. Comprehensive evaluations on established benchmarks demonstrate that DiffuNovo surpasses state-of-the-art methods in DNPS accuracy. Additionally, as the first DNPS model to employ a diffusion model as its core backbone, DiffuNovo leverages the powerful controllability of diffusion architecture and achieves a significant reduction in mass error, thereby producing much more physically plausible peptides. These innovations represent a substantial advancement toward robust and broadly applicable DNPS. The source code is available in the supplementary material.