Diffusion Decoding for Peptide De Novo Sequencing

📅 2025-07-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Autoregressive decoders in de novo peptide sequencing suffer from cascading errors and underutilize high-confidence spectral regions. Method: This paper introduces, for the first time, a non-autoregressive, fragment-initiated sequence generation framework based on discrete diffusion decoding. It proposes DINOISER—a novel loss function tailored for mass spectrometry data—that jointly integrates a Transformer encoder with a diffusion-based denoising mechanism, enabling bidirectional extension from arbitrary peptide subsequences. Contribution/Results: Experiments demonstrate that the method achieves a 0.373 absolute improvement in amino acid recall over Casanovo, the leading baseline, significantly mitigating error propagation. This validates the effectiveness and generalizability of diffusion modeling for discrete biological sequence generation, establishing a new paradigm for robust, confidence-aware de novo sequencing.

Technology Category

Application Category

📝 Abstract
Peptide de novo sequencing is a method used to reconstruct amino acid sequences from tandem mass spectrometry data without relying on existing protein sequence databases. Traditional deep learning approaches, such as Casanovo, mainly utilize autoregressive decoders and predict amino acids sequentially. Subsequently, they encounter cascading errors and fail to leverage high-confidence regions effectively. To address these issues, this paper investigates using diffusion decoders adapted for the discrete data domain. These decoders provide a different approach, allowing sequence generation to start from any peptide segment, thereby enhancing prediction accuracy. We experiment with three different diffusion decoder designs, knapsack beam search, and various loss functions. We find knapsack beam search did not improve performance metrics and simply replacing the transformer decoder with a diffusion decoder lowered performance. Although peptide precision and recall were still 0, the best diffusion decoder design with the DINOISER loss function obtained a statistically significant improvement in amino acid recall by 0.373 compared to the baseline autoregressive decoder-based Casanovo model. These findings highlight the potential of diffusion decoders to not only enhance model sensitivity but also drive significant advancements in peptide de novo sequencing.
Problem

Research questions and friction points this paper is trying to address.

Improving peptide sequencing accuracy without databases
Addressing cascading errors in autoregressive decoders
Exploring diffusion decoders for better amino acid recall
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion decoders for discrete peptide sequencing
Knapsack beam search and loss function experiments
DINOISER loss improves amino acid recall
🔎 Similar Papers
No similar papers found.