Multimodal Modeling of CRISPR-Cas12 Activity Using Foundation Models and Chromatin Accessibility Data

📅 2025-06-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Predicting gRNA activity in CRISPR-Cas12 systems is hindered by limited labeled data, diverse PAM requirements, and reliance on large-scale task-specific training. Method: We propose a domain-pretraining-free multimodal transfer learning framework: (1) directly transferring sequence embeddings from transcriptionally pre-trained RNA foundation models (e.g., Nucleotide Transformer) to gRNA activity prediction; (2) innovatively integrating chromatin accessibility features from ATAC-seq and ChIP-seq data; and (3) employing a lightweight fully connected regressor with multi-task joint fine-tuning. Contribution/Results: Our approach consistently outperforms conventional baselines across multiple Cas12 variants and PAM sequences, achieving average R² improvements of 0.18–0.25. It demonstrates strong robustness under low-data regimes, validating the efficacy and practicality of cross-modal transfer from biological foundation models to CRISPR functional prediction.

Technology Category

Application Category

📝 Abstract
Predicting guide RNA (gRNA) activity is critical for effective CRISPR-Cas12 genome editing but remains challenging due to limited data, variation across protospacer adjacent motifs (PAMs-short sequence requirements for Cas binding), and reliance on large-scale training. We investigate whether pre-trained biological foundation model originally trained on transcriptomic data can improve gRNA activity estimation even without domain-specific pre-training. Using embeddings from existing RNA foundation model as input to lightweight regressor, we show substantial gains over traditional baselines. We also integrate chromatin accessibility data to capture regulatory context, improving performance further. Our results highlight the effectiveness of pre-trained foundation models and chromatin accessibility data for gRNA activity prediction.
Problem

Research questions and friction points this paper is trying to address.

Improve CRISPR-Cas12 gRNA activity prediction accuracy
Leverage pre-trained RNA foundation models without domain-specific training
Integrate chromatin accessibility data for better regulatory context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses pre-trained RNA foundation model embeddings
Integrates chromatin accessibility data
Employs lightweight regressor for prediction
🔎 Similar Papers
No similar papers found.