SpecMER: Fast Protein Generation with K-mer Guided Speculative Decoding

๐Ÿ“… 2025-09-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Autoregressive protein generation models suffer from high inference latency, poor biological plausibility due to the absence of structural/functional priors, and likelihood distribution shiftโ€”hindering high-throughput screening applications. To address these limitations, we propose SpecMER, a k-mer-guided speculative decoding framework. SpecMER constructs a lightweight draft model from conserved k-mer motifs extracted via multiple sequence alignment, enabling parallel candidate sequence scoring and refinement in collaboration with a large target model. This explicitly incorporates biological priors into the decoding process. Experiments demonstrate that SpecMER achieves 24โ€“32% inference speedup over standard autoregressive decoding while preserving generation quality; it also significantly improves token acceptance rate and log-likelihood scores. By unifying computational efficiency with biologically informed constraints, SpecMER establishes a new paradigm for efficient and trustworthy de novo protein design.

Technology Category

Application Category

๐Ÿ“ Abstract
Autoregressive models have transformed protein engineering by enabling the generation of novel protein sequences beyond those found in nature. However, their sequential inference introduces significant latency, limiting their utility in high-throughput protein screening. Speculative decoding accelerates generation by employing a lightweight draft model to sample tokens, which a larger target model then verifies and refines. Yet, in protein sequence generation, draft models are typically agnostic to the structural and functional constraints of the target protein, leading to biologically implausible outputs and a shift in the likelihood distribution of generated sequences. We introduce SpecMER (Speculative Decoding via k-mer Guidance), a novel framework that incorporates biological, structural, and functional priors using k-mer motifs extracted from multiple sequence alignments. By scoring candidate sequences in parallel and selecting those most consistent with known biological patterns, SpecMER significantly improves sequence plausibility while retaining the efficiency of speculative decoding. SpecMER achieves 24-32% speedup over standard autoregressive decoding, along with higher acceptance rates and improved sequence likelihoods.
Problem

Research questions and friction points this paper is trying to address.

Accelerates protein sequence generation to reduce inference latency
Improves biological plausibility by incorporating structural constraints
Maintains efficiency while enhancing sequence acceptance rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses k-mer motifs from sequence alignments
Scores candidate sequences for biological plausibility
Retains speculative decoding efficiency with higher acceptance
๐Ÿ”Ž Similar Papers
No similar papers found.
T
Thomas Walton
Georgia Institute of Technology
D
Darin Tsui
Georgia Institute of Technology
A
Aryan Musharaf
Georgia Institute of Technology
Amirali Aghazadeh
Amirali Aghazadeh
ECE, Georgia Tech
AIMachine LearningSignal ProcessingComputational BiologyMolecular Design