🤖 AI Summary
This work proposes JEPA-DNA, a novel genomic foundation model that addresses the limitations of existing approaches relying solely on masked language modeling (MLM) or next-token prediction (NTP), which often fail to capture global functional context and yield biologically fragmented representations. JEPA-DNA introduces the Joint Embedding Predictive Architecture (JEPA) into genomic pretraining for the first time, supervising the CLS token in latent space to predict high-order functional embeddings of masked regions rather than merely reconstructing individual nucleotides. The framework synergistically integrates JEPA with MLM and NTP objectives, enabling either training from scratch or continual enhancement of existing models. Experimental results demonstrate that JEPA-DNA consistently outperforms purely generative baselines across multiple genomic benchmark tasks, achieving superior performance in both supervised and zero-shot settings while producing more robust and biologically meaningful representations.
📝 Abstract
Genomic Foundation Models (GFMs) have largely relied on Masked Language Modeling (MLM) or Next Token Prediction (NTP) to learn the language of life. While these paradigms excel at capturing local genomic syntax and fine-grained motif patterns, they often fail to capture the broader functional context, resulting in representations that lack a global biological perspective. We introduce JEPA-DNA, a novel pre-training framework that integrates the Joint-Embedding Predictive Architecture (JEPA) with traditional generative objectives. JEPA-DNA introduces latent grounding by coupling token-level recovery with a predictive objective in the latent space by supervising a CLS token. This forces the model to predict the high-level functional embeddings of masked genomic segments rather than focusing solely on individual nucleotides. JEPA-DNA extends both NTP and MLM paradigms and can be deployed either as a standalone from-scratch objective or as a continual pre-training enhancement for existing GFMs. Our evaluations across a diverse suite of genomic benchmarks demonstrate that JEPA-DNA consistently yields superior performance in supervised and zero-shot tasks compared to generative-only baselines. By providing a more robust and biologically grounded representation, JEPA-DNA offers a scalable path toward foundation models that understand not only the genomic alphabet, but also the underlying functional logic of the sequence.