Single-pass Adaptive Image Tokenization for Minimum Program Search

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Existing vision representation learning methods typically employ fixed-length tokenization, ignoring the dynamic variations in image complexity and familiarity. Method: We propose KARL—a single-forward-pass, adaptive image tokenizer that models Kolmogorov complexity as a learnable stopping mechanism, enabling search-free optimal token count prediction aligned with the Minimum Description Length principle. Built upon an encoder-decoder architecture, KARL integrates approximate Kolmogorov complexity estimation, conditional stopping prediction, and inverse reinforcement learning for unified continuous and discrete tokenization. Contribution/Results: Experiments demonstrate that KARL achieves performance on par with state-of-the-art adaptive tokenizers while significantly improving inference efficiency. Token counts scale rationally with image complexity—including structural intricacy, noise level, and out-of-distribution content—and the learned complexity estimates strongly correlate with human perceptual intuition.

Technology Category

Application Category

📝 Abstract

According to Algorithmic Information Theory (AIT) -- Intelligent representations compress data into the shortest possible program that can reconstruct its content, exhibiting low Kolmogorov Complexity (KC). In contrast, most visual representation learning systems use fixed-length representations for all inputs, ignoring variations in complexity or familiarity. Recent adaptive tokenization methods address this by allocating variable-length representations but typically require test-time search over multiple encodings to find the most predictive one. Inspired by Kolmogorov Complexity principles, we propose a single-pass adaptive tokenizer, KARL, which predicts the appropriate number of tokens for an image in a single forward pass, halting once its approximate KC is reached. The token count serves as a proxy for the minimum description length. KARL's training procedure closely resembles the Upside-Down Reinforcement Learning paradigm, as it learns to conditionally predict token halting based on a desired reconstruction quality. KARL matches the performance of recent adaptive tokenizers while operating in a single pass. We present scaling laws for KARL, analyzing the role of encoder/decoder size, continuous vs. discrete tokenization and more. Additionally, we offer a conceptual study drawing an analogy between Adaptive Image Tokenization and Algorithmic Information Theory, examining the predicted image complexity (KC) across axes such as structure vs. noise and in- vs. out-of-distribution familiarity -- revealing alignment with human intuition.

Problem

Research questions and friction points this paper is trying to address.

Adaptive image tokenization for variable complexity inputs

Single-pass prediction of optimal token count per image

Aligning visual representation with Kolmogorov Complexity principles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-pass adaptive tokenizer for images

Predicts token count via Kolmogorov Complexity

Uses Upside-Down Reinforcement Learning training

🔎 Similar Papers

Subobject-level Image Tokenization