REOrdering Patches Improves Vision Models

📅 2025-05-29

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Patch ordering in Vision Transformers significantly impacts model performance, yet conventional row-major sequencing is suboptimal. This work reframes patch ordering as a task-driven, learnable combinatorial decision problem—the first such formulation. We propose REOrder, a two-stage framework: (1) constructing a Plackett–Luce probabilistic ranking model grounded in information-theoretic priors, and (2) optimizing the permutation via REINFORCE-based reinforcement learning to discover task-optimal sequences. REOrder breaks from fixed-order paradigms and enables end-to-end adaptation to Vision Transformer architectures. Evaluated on ImageNet-1K, it achieves a +3.01% absolute Top-1 accuracy gain over baseline ViT; on the Functional Map of the World dataset, it yields a +13.35% improvement. These results empirically validate the critical role of data-adaptive patch ordering in visual representation learning.

Technology Category

Application Category

📝 Abstract

Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is permutation-equivariant, modern long-sequence transformers increasingly rely on architectural approximations that break this invariance and introduce sensitivity to patch ordering. We show that patch order significantly affects model performance in such settings, with simple alternatives like column-major or Hilbert curves yielding notable accuracy shifts. Motivated by this, we propose REOrder, a two-stage framework for discovering task-optimal patch orderings. First, we derive an information-theoretic prior by evaluating the compressibility of various patch sequences. Then, we learn a policy over permutations by optimizing a Plackett-Luce policy using REINFORCE. This approach enables efficient learning in a combinatorial permutation space. REOrder improves top-1 accuracy over row-major ordering on ImageNet-1K by up to 3.01% and Functional Map of the World by 13.35%.

Problem

Research questions and friction points this paper is trying to address.

Patch order impacts vision transformer model performance

Existing methods break permutation invariance in transformers

Optimal patch ordering improves accuracy in vision tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes REOrder for optimal patch ordering

Uses information-theoretic prior for compressibility

Learns permutation policy with REINFORCE

🔎 Similar Papers

No similar papers found.

Nvidia

30 USD - 94 USD

US, CA, Santa Clara

Post-Training Platform Infrastructure Engineer

AMD

San Jose, CA (Hybrid) / other US locations

Authors to Follow