(1D) Ordered Tokens Enable Efficient Test-Time Search

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

179K/year
🤖 AI Summary
This work addresses the challenge of enhancing controllability and efficiency in test-time search for autoregressive models. To this end, it proposes replacing the conventional two-dimensional grid representation with a one-dimensional coarse-to-fine ordered token structure, enabling high-quality text-to-image generation without any additional training. The approach integrates a 1D ordered tokenizer, autoregressive modeling, an image-text verifier, and multiple classical search strategies—including best-of-N sampling, beam search, and look-ahead search—to systematically investigate the interplay between token structure and search algorithms. Experimental results demonstrate that the proposed structure substantially outperforms standard grid layouts in terms of test-time scalability and guidance effectiveness, marking the first demonstration of training-free, high-fidelity text-to-image synthesis within an autoregressive framework.

Technology Category

Application Category

📝 Abstract
Tokenization is a key component of autoregressive (AR) generative models, converting raw data into more manageable units for modeling. Commonly, tokens describe local information, such as regions of pixels in images or word pieces in text, and AR generation predicts these tokens in a fixed order. A worthwhile question is whether token structures affect the ability to steer the generation through test-time search, where multiple candidate generations are explored and evaluated by a verifier. Using image generation as our testbed, we hypothesize that recent 1D ordered tokenizers with coarse-to-fine structure can be more amenable to search than classical 2D grid structures. This is rooted in the fact that the intermediate states in coarse-to-fine sequences carry semantic meaning that verifiers can reliably evaluate, enabling effective steering during generation. Through controlled experiments, we find that AR models trained on coarse-to-fine ordered tokens exhibit improved test-time scaling behavior compared to grid-based counterparts. Moreover, we demonstrate that, thanks to the ordered structure, pure test-time search over token sequences (i.e., without training an AR model) can perform training-free text-to-image generation when guided by an image-text verifier. Beyond this, we systematically study how classical search algorithms (best-of-N, beam search, lookahead search) interact with different token structures, as well as the role of different verifiers and AR priors. Our results highlight the impact of token structure on inference-time scalability and provide practical guidance for test-time scaling in AR models.
Problem

Research questions and friction points this paper is trying to address.

token structure
test-time search
autoregressive models
image generation
verifier-guided generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

1D ordered tokens
coarse-to-fine tokenization
test-time search
autoregressive generation
verifier-guided generation