Discrete JEPA: Learning Discrete Token Representations without Reconstruction

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing image tokenization methods struggle to support symbolic abstraction and logical reasoning, thereby limiting systematic visual reasoning capabilities. To address this, we propose a robust discrete semantic tokenization framework explicitly designed for symbolic reasoning. Our approach is the first to extend the Joint Embedding Predictive Architecture (JEPA) to discrete token space, integrating vector quantization (VQ), semantic consistency constraints, and a multi-step future-token prediction objective—while deliberately abandoning pixel-level reconstruction to directly learn structured semantic representations. Evaluated on visual symbol prediction tasks, our method significantly outperforms mainstream baselines. The resulting token space exhibits interpretable regularity and systematic compositional patterns, enabling principled symbolic manipulation. This work establishes a novel paradigm for building visual world models capable of explicit, interpretable, and systematic reasoning.

Technology Category

Application Category

📝 Abstract

The cornerstone of cognitive intelligence lies in extracting hidden patterns from observations and leveraging these principles to systematically predict future outcomes. However, current image tokenization methods demonstrate significant limitations in tasks requiring symbolic abstraction and logical reasoning capabilities essential for systematic inference. To address this challenge, we propose Discrete-JEPA, extending the latent predictive coding framework with semantic tokenization and novel complementary objectives to create robust tokenization for symbolic reasoning tasks. Discrete-JEPA dramatically outperforms baselines on visual symbolic prediction tasks, while striking visual evidence reveals the spontaneous emergence of deliberate systematic patterns within the learned semantic token space. Though an initial model, our approach promises a significant impact for advancing Symbolic world modeling and planning capabilities in artificial intelligence systems.

Problem

Research questions and friction points this paper is trying to address.

Enhancing symbolic abstraction in image tokenization

Improving logical reasoning for systematic inference

Advancing symbolic world modeling in AI systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends latent predictive coding framework

Introduces semantic tokenization for symbolic reasoning

Uses novel complementary objectives for robustness

🔎 Similar Papers

Unsupervised Morphological Tree Tokenizer