Structured State-Space Regularization for Compact and Generation-Friendly Image Tokenization

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Existing image tokenizers struggle to balance compactness with generation-friendliness. This work proposes a theory-driven regularization approach that, for the first time, incorporates frequency-aware dynamics from state space models into the image tokenization process. By designing a frequency-domain-aware regularization term, the method guides the tokenizer to learn latent representations that jointly capture spatial structure and spectral characteristics. The resulting tokenization preserves high reconstruction fidelity while significantly enhancing the generation quality of diffusion models, yielding a more efficient and generation-friendly image representation.

Technology Category

Application Category

📝 Abstract

Image tokenizers are central to modern vision models as they often operate in latent spaces. An ideal latent space must be simultaneously compact and generation-friendly: it should capture image's essential content compactly while remaining easy to model with generative approaches. In this work, we introduce a novel regularizer to align latent spaces with these two objectives. The key idea is to guide tokenizers to mimic the hidden state dynamics of state-space models (SSMs), thereby transferring their critical property, frequency awareness, to latent features. Grounded in a theoretical analysis of SSMs, our regularizer enforces encoding of fine spatial structures and frequency-domain cues into compact latent features; leading to more effective use of representation capacity and improved generative modelability. Experiments demonstrate that our method improves generation quality in diffusion models while incurring only minimal loss in reconstruction fidelity.

Problem

Research questions and friction points this paper is trying to address.

image tokenization

latent space

compact representation

generation-friendly

state-space models

Innovation

Methods, ideas, or system contributions that make the work stand out.

state-space models

image tokenization

frequency awareness