SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

While Joint-Embedding Predictive Architecture (JEPA) excels in general-purpose representation learning, its dense embeddings suffer from poor interpretability and low computational efficiency. Method: We propose Sparse-JEPA—the first JEPA variant incorporating structured sparsity regularization and grouped latent variable sharing. Specifically, we enforce group-wise sparsity constraints to encourage semantically related features to share latent variables, thereby enhancing representational disentanglement and compactness without compromising predictive fidelity. We theoretically establish that grouping reduces latent variable multi-information and satisfies the data processing inequality. Results: Pretraining a lightweight Vision Transformer on CIFAR-100 with Sparse-JEPA yields substantial gains in linear-probe classification accuracy. Moreover, representations exhibit improved generalization to downstream tasks and are more interpretable, object-centric, and semantically disentangled—demonstrating both empirical efficacy and principled design.

Technology Category

Application Category

📝 Abstract

Joint Embedding Predictive Architectures (JEPA) have emerged as a powerful framework for learning general-purpose representations. However, these models often lack interpretability and suffer from inefficiencies due to dense embedding representations. We propose SparseJEPA, an extension that integrates sparse representation learning into the JEPA framework to enhance the quality of learned representations. SparseJEPA employs a penalty method that encourages latent space variables to be shared among data features with strong semantic relationships, while maintaining predictive performance. We demonstrate the effectiveness of SparseJEPA by training on the CIFAR-100 dataset and pre-training a lightweight Vision Transformer. The improved embeddings are utilized in linear-probe transfer learning for both image classification and low-level tasks, showcasing the architecture's versatility across different transfer tasks. Furthermore, we provide a theoretical proof that demonstrates that the grouping mechanism enhances representation quality. This was done by displaying that grouping reduces Multiinformation among latent-variables, including proofing the Data Processing Inequality for Multiinformation. Our results indicate that incorporating sparsity not only refines the latent space but also facilitates the learning of more meaningful and interpretable representations. In further work, hope to further extend this method by finding new ways to leverage the grouping mechanism through object-centric representation learning.

Problem

Research questions and friction points this paper is trying to address.

Enhances interpretability and efficiency in Joint Embedding Predictive Architectures

Improves representation quality via sparse learning and semantic feature grouping

Demonstrates versatility in transfer tasks like image classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates sparse representation learning into JEPA

Uses penalty method for semantic feature sharing

Enhances interpretability with grouping mechanism

🔎 Similar Papers

Joint-Embedding Masked Autoencoder for Self-supervised Learning of Dynamic Functional Connectivity from the Human Brain