LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

258K/year

🤖 AI Summary

This work addresses the limitation of existing reward-guided DNA generation methods, which are typically confined to fixed-length sequences and struggle to produce biologically meaningful variable-length DNA. The authors propose Local Perturbation Discrete Planning (LPDP), a novel approach that, for the first time, integrates training-agnostic local discrete optimization into a DNA editing flow framework. During inference, LPDP scores, filters, and re-ranks root edits, then solves a bounded local optimization problem in their neighborhoods, supporting biologically plausible operations such as insertion, deletion, and substitution. By distinguishing between pre-load and post-load reward scenarios and leveraging subgraph aggregation based on edit-type geometric structure together with Max or log-sum-exponential backtracking strategies, LPDP efficiently generates high-quality, variable-length DNA sequences that satisfy biological constraints, demonstrating strong performance in enhancer optimization and exon–intron–exon repair tasks.

📝 Abstract

We study the application of recent Edit Flows for inference-time reward control for DNA sequence generation. Unlike most reward-guided DNA generation frameworks, which operate on fixed-length sequence spaces, Edit Flows have a potential to generate variable-length DNA through biologically plausible insertion, deletion, and substitution operations. In particular, we propose Local Perturbation Discrete Programming (LPDP), a training-free, intermediate-state and action-aware local re-solving operator for variable-length DNA edit-action generators at inference time. More specifically, at each guided rollout step, LPDP scores one-step root edits, retains a near-best root band, and re-ranks each retained root by solving a bounded local discrete program around its child sequence. This local program uses the typed geometry of edit actions to focus on coherent substitution, insertion, or deletion subgraphs, and aggregates local continuations with either a hard Max backup or a soft log-sum-exponential (LSE) backup. We instantiate LPDP in two regimes: front-loaded reward tilting for enhancer optimization, where early edits are critical for establishing global regulatory sequence structure, and back-loaded reward tilting for exon-intron-exon inpainting, where late edits fine-tune splice-boundary contexts.

Problem

Research questions and friction points this paper is trying to address.

DNA sequence generation

variable-length generation

inference-time reward control

edit operations

biological sequence optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edit Flows

LPDP

inference-time reward control