Dynamic Tokenization via Reinforcement Patching: End-to-end Training and Zero-shot Transfer

📅 2026-03-27

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenge of efficient and adaptive dynamic segmentation in long sequential data by proposing ReinPatch, a framework that formulates segment boundary selection as a discrete decision process. Leveraging reinforcement learning, ReinPatch jointly optimizes the segmentation policy and the downstream sequence model within an end-to-end training paradigm. It is the first method to support strict compression rate control, multi-level modeling, and decoupling into a general-purpose base segmenter amenable to zero-shot transfer. Built upon the Group Relative Policy Gradient (GRPG) algorithm, ReinPatch outperforms existing data-driven segmentation approaches on time series forecasting tasks while offering interpretable segmentation behavior.

Technology Category

Application Category

📝 Abstract

Efficiently aggregating spatial or temporal horizons to acquire compact representations has become a unifying principle in modern deep learning models, yet learning data-adaptive representations for long-horizon sequence data, especially continuous sequences like time series, remains an open challenge. While fixed-size patching has improved scalability and performance, discovering variable-sized, data-driven patches end-to-end often forces models to rely on soft discretization, specific backbones, or heuristic rules. In this work, we propose Reinforcement Patching (ReinPatch), the first framework to jointly optimize a sequence patching policy and its downstream sequence backbone model using reinforcement learning. By formulating patch boundary placement as a discrete decision process optimized via Group Relative Policy Gradient (GRPG), ReinPatch bypasses the need for continuous relaxations and performs dynamic patching policy optimization in a natural manner. Moreover, our method allows strict enforcement of a desired compression rate, freeing the downstream backbone to scale efficiently, and naturally supports multi-level hierarchical modeling. We evaluate ReinPatch on time-series forecasting datasets, where it demonstrates compelling performance compared to state-of-the-art data-driven patching strategies. Furthermore, our detached design allows the patching module to be extracted as a standalone foundation patcher, providing the community with visual and empirical insights into the segmentation behaviors preferred by a purely performance-driven neural patching strategy.

Problem

Research questions and friction points this paper is trying to address.

dynamic tokenization

data-adaptive representation

long-horizon sequence

time series

variable-sized patching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement Patching

Dynamic Tokenization

Group Relative Policy Gradient