Continuous Diffusion Transformers for Designing Synthetic Regulatory Elements

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel approach for efficiently generating 200-bp synthetic regulatory DNA sequences with cell-type specificity while mitigating over-memorization of training data. We introduce, for the first time, a diffusion Transformer (DiT) architecture combined with a 2D CNN input encoder to replace the conventional U-Net backbone, and further enhance functional activity through reinforcement fine-tuning via DDPO and an Enformer-based reward model. Remarkably, our method surpasses the previous best-performing U-Net baseline in just 13 training epochs, achieving a 39% reduction in validation loss, decreasing the proportion of memorized sequences from 5.3% to 1.7%, and improving predicted regulatory activity by 38-fold. Moreover, the model demonstrates strong generalization capabilities on independent tasks.

Technology Category

Application Category

📝 Abstract
We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60$\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.
Problem

Research questions and friction points this paper is trying to address.

regulatory DNA sequences
cell-type-specific
sequence generation
memorization
regulatory activity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion Transformer
regulatory DNA sequence generation
parameter-efficient modeling
DDPO fine-tuning
Enformer reward model
🔎 Similar Papers
No similar papers found.
Jonathan Liu
Jonathan Liu
Princeton University
Computer VisionNatural Language ProcessingMathematics
K
Kia Ghods
Department of Computer Science, Princeton University