🤖 AI Summary
Biological sequence design requires balancing multiple conflicting objectives—e.g., binding affinity, toxicity, and stability—yet existing methods suffer from distribution distortion due to continuous embedding. This paper introduces the first multi-objective flow matching framework tailored for discrete biological sequences. It innovatively integrates a hybrid rank-directional gradient estimator with an adaptive hyperconic filtering mechanism, enabling plug-and-play Pareto optimization over arbitrary pre-trained discrete generative models (e.g., PepDFM, EnhancerDFM). By operating directly in the discrete sequence space, it avoids mapping biases inherent in continuous latent representations and supports end-to-end controllable generation. Experiments demonstrate that the generated peptides achieve Pareto optimality across five pharmacological attributes—exhibiting both high bioactivity and low hemolytic toxicity. Designed enhancer DNA sequences precisely modulate cell-type specificity and 3D structural features, with functional validation confirming substantial performance improvement.
📝 Abstract
Designing biological sequences that satisfy multiple, often conflicting, functional and biophysical criteria remains a central challenge in biomolecule engineering. While discrete flow matching models have recently shown promise for efficient sampling in high-dimensional sequence spaces, existing approaches address only single objectives or require continuous embeddings that can distort discrete distributions. We present Multi-Objective-Guided Discrete Flow Matching (MOG-DFM), a general framework to steer any pretrained discrete-time flow matching generator toward Pareto-efficient trade-offs across multiple scalar objectives. At each sampling step, MOG-DFM computes a hybrid rank-directional score for candidate transitions and applies an adaptive hypercone filter to enforce consistent multi-objective progression. We also trained two unconditional discrete flow matching models, PepDFM for diverse peptide generation and EnhancerDFM for functional enhancer DNA generation, as base generation models for MOG-DFM. We demonstrate MOG-DFM's effectiveness in generating peptide binders optimized across five properties (hemolysis, non-fouling, solubility, half-life, and binding affinity), and in designing DNA sequences with specific enhancer classes and DNA shapes. In total, MOG-DFM proves to be a powerful tool for multi-property-guided biomolecule sequence design.