Navigating heterogeneous protein landscapes through geometry-aware smoothing

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that conventional generative models, relying on fixed global noise, struggle to balance fidelity to functional clusters and effective exploration of sparse regions in the highly sparse and heterogeneous protein sequence space, often yielding non-functional sequences. To overcome this limitation, the authors propose a geometry-aware diffusion generative framework featuring a novel density-dependent smoothing (DDS) mechanism, which inversely couples diffusion noise intensity with local sequence density. This design enables fine-grained optimization in high-density regions while facilitating controlled exploration in sparse areas. The method significantly outperforms existing diffusion and autoregressive models across multiple tasks—including antibody repertoire modeling, therapeutic antibody design, antimicrobial peptide generation, and coronavirus antibody design—simultaneously enhancing both functionality and diversity of the generated sequences.

Technology Category

Application Category

📝 Abstract
The evolutionary fitness landscape of biological molecules is extremely sparse and heterogeneous, with functional sequences forming isolated dense ``islands''within a vast combinatorial space of largely non-functional variants. Protein sequences, in particular, exemplify this structure, yet most generative artificial intelligence models implicitly assume a homogeneous data distribution. We show that this assumption fundamentally breaks down in heterogeneous biological sequence spaces: fixed global noise levels impose a destructive trade-off, either oversmoothing dense functional clusters or fragmenting sparse regions and producing non-functional hallucinations. To address this limitation, we introduce \emph{Density-Dependent Smoothing} (DDS), a geometry-aware generative framework that adapts stochastic smoothing to the local density of the underlying sequence landscape. By inversely coupling diffusion noise to estimated sequence density, DDS enables gentle refinement in high-density functional regions while promoting controlled exploration across sparse regions. Implemented as a plug-in mechanism for discrete molecular sampling, DDS consistently outperforms state-of-the-art diffusion and autoregressive models across antibody repertoires, therapeutic antibody design, antimicrobial peptide generation and coronavirus antibody design. Together, these results show that fixed global smoothing assumptions fundamentally limit generative modeling in sparse biological sequence spaces, and that geometry-aware smoothing removes this constraint, enabling reliable exploration and design previously unattainable with fixed-noise generative models.
Problem

Research questions and friction points this paper is trying to address.

heterogeneous protein landscapes
evolutionary fitness landscape
generative modeling
sequence space sparsity
functional sequence islands
Innovation

Methods, ideas, or system contributions that make the work stand out.

Density-Dependent Smoothing
geometry-aware generative modeling
protein sequence design
heterogeneous fitness landscape
adaptive diffusion noise
🔎 Similar Papers
No similar papers found.
Srinivas Anumasa
Srinivas Anumasa
Post Doctoral Researcher
Machine learningDiffusionspiking neural networksNeural ODE
B
Barath Chandran
Indian Institute of Technology, Roorkee.
Tingting Chen
Tingting Chen
National University of Singapore
Machine LearningComputer Vision
N
Nuwaisir Mohammad Rahman
National University of Singapore.
Y
Yingtao Zhu
National University of Singapore.
R
Rushi Shah
National University of Singapore.
H
Hongyu He
National University of Singapore.
P
Peisong Zhang
National University of Singapore.
Y
Yizhen Liao
National University of Singapore.
Y
Yiming Tang
National University of Singapore.
Y
Yong Shen
Xijiao Liverpool University.
Tianfan Fu
Tianfan Fu
Nanjing University
AI for DrugAI for ScienceLarge Language Model
R
Rui Qing
Shanghai Jiaotong University.
X
Xiao Li
Peking University.
Sebastian Maurer-Stroh
Sebastian Maurer-Stroh
Executive Director, Bioinformatics Institute (BII), A*STAR Singapore
BioinformaticsComputational BiologyProtein SequencesStructuresEvolution
Xinyi Su
Xinyi Su
National University Singapore, Institute of Molecular and Cellular Biology (IMCB)
BiomaterialsRetinal Cell Therapy
Z
Zhizhuo Zhang
GSK.ai, USA.
Dianbo Liu
Dianbo Liu
Assistant professor, National University of Singapore
Push the limits of humanmachine learningbiomedical sciences