🤖 AI Summary
This work proposes a novel approach for efficiently generating 200-bp synthetic regulatory DNA sequences with cell-type specificity while mitigating over-memorization of training data. We introduce, for the first time, a diffusion Transformer (DiT) architecture combined with a 2D CNN input encoder to replace the conventional U-Net backbone, and further enhance functional activity through reinforcement fine-tuning via DDPO and an Enformer-based reward model. Remarkably, our method surpasses the previous best-performing U-Net baseline in just 13 training epochs, achieving a 39% reduction in validation loss, decreasing the proportion of memorized sequences from 5.3% to 1.7%, and improving predicted regulatory activity by 38-fold. Moreover, the model demonstrates strong generalization capabilities on independent tasks.
📝 Abstract
We present a parameter-efficient Diffusion Transformer (DiT) for generating 200bp cell-type-specific regulatory DNA sequences. By replacing the U-Net backbone of DNA-Diffusion with a transformer denoiser equipped with a 2D CNN input encoder, our model matches the U-Net's best validation loss in 13 epochs (60$\times$ fewer) and converges 39% lower, while reducing memorization from 5.3% to 1.7% of generated sequences aligning to training data via BLAT. Ablations show the CNN encoder is essential: without it, validation loss increases 70% regardless of positional embedding choice. We further apply DDPO finetuning using Enformer as a reward model, achieving a 38$\times$ improvement in predicted regulatory activity. Cross-validation against DRAKES on an independent prediction task confirms that improvements reflect genuine regulatory signal rather than reward model overfitting.