Regulatory DNA sequence Design with Reinforcement Learning

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

To address the challenges of local optima and insufficient biological prior guidance in *de novo* design of cis-regulatory elements (CREs), this study proposes a biology-informed reinforcement learning framework. We innovatively model the biological logic of transcription factor binding site insertion and deletion as differentiable reward signals to guide knowledge-augmented fine-tuning of a pre-trained autoregressive language model. The method integrates computation-driven reward modeling, sequence generation, and multi-condition functional evaluation. Validated in yeast under two growth conditions and in three human cell types, our approach successfully generates promoters and enhancers exhibiting high expression adaptability, substantial sequence diversity, and experimentally verifiable functionality. The implementation is open-sourced, establishing a scalable, biologically grounded design paradigm for synthetic biology and gene therapy applications.

Technology Category

Application Category

📝 Abstract

Cis-regulatory elements (CREs), such as promoters and enhancers, are relatively short DNA sequences that directly regulate gene expression. The fitness of CREs, measured by their ability to modulate gene expression, highly depends on the nucleotide sequences, especially specific motifs known as transcription factor binding sites (TFBSs). Designing high-fitness CREs is crucial for therapeutic and bioengineering applications. Current CRE design methods are limited by two major drawbacks: (1) they typically rely on iterative optimization strategies that modify existing sequences and are prone to local optima, and (2) they lack the guidance of biological prior knowledge in sequence optimization. In this paper, we address these limitations by proposing a generative approach that leverages reinforcement learning (RL) to fine-tune a pre-trained autoregressive (AR) model. Our method incorporates data-driven biological priors by deriving computational inference-based rewards that simulate the addition of activator TFBSs and removal of repressor TFBSs, which are then integrated into the RL process. We evaluate our method on promoter design tasks in two yeast media conditions and enhancer design tasks for three human cell types, demonstrating its ability to generate high-fitness CREs while maintaining sequence diversity. The code is available at https://github.com/yangzhao1230/TACO.

Problem

Research questions and friction points this paper is trying to address.

Design high-fitness cis-regulatory elements (CREs) for gene expression regulation.

Overcome limitations of iterative optimization and lack of biological guidance.

Use reinforcement learning to generate diverse, high-fitness CREs for therapeutic applications.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning fine-tunes autoregressive model

Incorporates biological priors via computational inference rewards

Generates high-fitness CREs with sequence diversity

🔎 Similar Papers

Deep Reinforcement Learning for Controlled Traversing of the Attractor Landscape of Boolean Models in the Context of Cellular Reprogramming