RadJEPA: Radiology Encoder for Chest X-Rays via Joint Embedding Predictive Architecture

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work proposes RadJEPA, a self-supervised representation learning method for chest X-rays that operates without requiring image–text paired data. RadJEPA introduces the Joint Embedding Predictive Architecture (JEPA) to medical imaging for the first time, replacing conventional global representation alignment with predictive modeling of masked regions via latent representations. By explicitly capturing local semantic structures and eliminating reliance on language-based supervision, the method achieves substantial performance gains over existing approaches such as Rad-DINO. Evaluated across multiple downstream tasks—including disease classification, semantic segmentation, and radiology report generation—RadJEPA sets a new state-of-the-art, demonstrating its effectiveness in learning rich, transferable representations from unlabeled medical images.

Technology Category

Application Category

📝 Abstract

Recent advances in medical vision language models guide the learning of visual representations; however, this form of supervision is constrained by the availability of paired image text data, raising the question of whether robust radiology encoders can be learned without relying on language supervision. In this work, we introduce RadJEPA, a self-supervised framework built on a Joint Embedding Predictive Architecture that learns without language supervision. Pre-trained solely on unlabeled chest X-ray images, the model learns to predict latent representations of masked image regions. This predictive objective differs fundamentally from both image text pre-training and DINO-style self-distillation: rather than aligning global representations across views or modalities, RadJEPA explicitly models latent-space prediction. We evaluate the learned encoder on disease classification, semantic segmentation, and report generation tasks. Across benchmarks, RadJEPA achieves performance exceeding state-of-the-art approaches, including Rad-DINO.

Problem

Research questions and friction points this paper is trying to address.

self-supervised learning

radiology encoder

chest X-ray

language-free representation

medical vision

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning

joint embedding predictive architecture

chest X-ray