RadDiff: Retrieval-Augmented Denoising Diffusion for Protein Inverse Folding

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

Protein inverse folding—designing functional sequences from target 3D structures—remains a central challenge in computational protein engineering. Existing approaches either neglect evolutionary information or rely on parameter-heavy, computationally expensive protein language models (PLMs) with limited scalability. We propose RadDiff, a retrieval-augmented denoising diffusion model that leverages structural similarity to retrieve nearby templates from large-scale protein databases, then constructs position-specific amino acid profiles as conditional priors to guide sequence generation. RadDiff integrates hierarchical structural retrieval, residue-level alignment, a lightweight ensemble module, and diffusion-based modeling, achieving superior parameter efficiency and generalization. On CATH, PDB, and TS50 benchmarks, RadDiff improves sequence recovery rates by up to 19% over prior methods; generated sequences exhibit high foldability, and performance scales consistently with database size.

Technology Category

Application Category

📝 Abstract

Protein inverse folding, the design of an amino acid sequence based on a target 3D structure, is a fundamental problem of computational protein engineering. Existing methods either generate sequences without leveraging external knowledge or relying on protein language models (PLMs). The former omits the evolutionary information stored in protein databases, while the latter is parameter-inefficient and inflexible to adapt to ever-growing protein data. To overcome the above drawbacks, in this paper we propose a novel method, called retrieval-augmented denoising diffusion (RadDiff), for protein inverse folding. Given the target protein backbone, RadDiff uses a hierarchical search strategy to efficiently retrieve structurally similar proteins from large protein databases. The retrieved structures are then aligned residue-by-residue to the target to construct a position-specific amino acid profile, which serves as an evolutionary-informed prior that conditions the denoising process. A lightweight integration module is further designed to incorporate this prior effectively. Experimental results on the CATH, PDB, and TS50 datasets show that RadDiff consistently outperforms existing methods, improving sequence recovery rate by up to 19%. Experimental results also demonstrate that RadDiff generates highly foldable sequences and scales effectively with database size.

Problem

Research questions and friction points this paper is trying to address.

Designs amino acid sequences from target protein 3D structures

Overcomes limitations of ignoring evolutionary data and inefficient models

Uses retrieved structural data to guide sequence generation efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical search retrieves structurally similar proteins

Aligns retrieved structures to construct evolutionary-informed amino acid profile

Lightweight integration module incorporates prior into denoising diffusion process

🔎 Similar Papers

AlphaFolding: 4D Diffusion for Dynamic Protein Structure Prediction with Reference and Motion Guidance