Var-JEPA: A Variational Formulation of the Joint-Embedding Predictive Architecture -- Bridging Predictive and Generative Self-Supervised Learning

📅 2026-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the disconnect between Joint Embedding Predictive Architectures (JEPA) and probabilistic generative modeling, as well as JEPA’s reliance on heuristic regularization to prevent representation collapse. From a variational inference perspective, we reinterpret JEPA as a deterministic special case of a coupled latent variable model and, for the first time, integrate it into a variational autoencoding framework, yielding Var-JEPA. This approach introduces an explicit generative structure that unifies predictive and generative self-supervised learning, enabling meaningful representations without heuristic anti-collapse regularizers and supporting uncertainty quantification in the latent space. Leveraging ELBO optimization and a context–target prediction architecture, we instantiate Var-T-JEPA for tabular data, which significantly outperforms T-JEPA on downstream tasks and matches the performance of strong baseline methods using raw features.

Technology Category

Application Category

📝 Abstract
The Joint-Embedding Predictive Architecture (JEPA) is often seen as a non-generative alternative to likelihood-based self-supervised learning, emphasizing prediction in representation space rather than reconstruction in observation space. We argue that the resulting separation from probabilistic generative modeling is largely rhetorical rather than structural: the canonical JEPA design, coupled encoders with a context-to-target predictor, mirrors the variational posteriors and learned conditional priors obtained when variational inference is applied to a particular class of coupled latent-variable models, and standard JEPA can be viewed as a deterministic specialization in which regularization is imposed via architectural and training heuristics rather than an explicit likelihood. Building on this view, we derive the Variational JEPA (Var-JEPA), which makes the latent generative structure explicit by optimizing a single Evidence Lower Bound (ELBO). This yields meaningful representations without ad-hoc anti-collapse regularizers and allows principled uncertainty quantification in the latent space. We instantiate the framework for tabular data (Var-T-JEPA) and achieve strong representation learning and downstream performance, consistently improving over T-JEPA while remaining competitive with strong raw-feature baselines.
Problem

Research questions and friction points this paper is trying to address.

Joint-Embedding Predictive Architecture
self-supervised learning
generative modeling
variational inference
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Variational JEPA
Joint-Embedding Predictive Architecture
Evidence Lower Bound
Self-Supervised Learning
Latent Variable Models
🔎 Similar Papers
No similar papers found.