Gumbel-Softmax Flow Matching with Straight-Through Guidance for Controllable Biological Sequence Generation

📅 2025-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Controllable generation of high-dimensional discrete biological sequences (e.g., DNA, peptides, proteins) on the continuous probability simplex remains challenging, as existing flow-matching methods fail to scale to high-dimensional simplices. Method: We propose the first simplex-aware flow-matching paradigm, introducing time-varying temperature Gumbel-Softmax interpolation and Straight-Through Guided Flow (STGFlow). STGFlow enables classifier-guided generation without retraining—offering inference-time controllability and eliminating the need for costly fine-tuning. Our approach unifies Gumbel-Softmax-based flow matching, continuous flow modeling on the simplex, and straight-through estimation. Contribution/Results: The method overcomes scalability bottlenecks of discrete flow models on high-dimensional simplices. It achieves state-of-the-art performance in promoter design, full-length protein generation, and rare-disease-targeting peptide design—significantly improving generation quality, diversity, and target alignment.

Technology Category

Application Category

📝 Abstract
Flow matching in the continuous simplex has emerged as a promising strategy for DNA sequence design, but struggles to scale to higher simplex dimensions required for peptide and protein generation. We introduce Gumbel-Softmax Flow and Score Matching, a generative framework on the simplex based on a novel Gumbel-Softmax interpolant with a time-dependent temperature. Using this interpolant, we introduce Gumbel-Softmax Flow Matching by deriving a parameterized velocity field that transports from smooth categorical distributions to distributions concentrated at a single vertex of the simplex. We alternatively present Gumbel-Softmax Score Matching which learns to regress the gradient of the probability density. Our framework enables high-quality, diverse generation and scales efficiently to higher-dimensional simplices. To enable training-free guidance, we propose Straight-Through Guided Flows (STGFlow), a classifier-based guidance method that leverages straight-through estimators to steer the unconditional velocity field toward optimal vertices of the simplex. STGFlow enables efficient inference-time guidance using classifiers pre-trained on clean sequences, and can be used with any discrete flow method. Together, these components form a robust framework for controllable de novo sequence generation. We demonstrate state-of-the-art performance in conditional DNA promoter design, sequence-only protein generation, and target-binding peptide design for rare disease treatment.
Problem

Research questions and friction points this paper is trying to address.

Scalable generative framework for peptide and protein sequence design
Training-free guidance for controllable biological sequence generation
High-quality diverse sequence generation in higher-dimensional simplices
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gumbel-Softmax Flow Matching for simplex generation
Straight-Through Guided Flows for training-free guidance
Classifier-based guidance with straight-through estimators
🔎 Similar Papers
No similar papers found.