Prefix Grouper: Efficient GRPO Training through Shared-Prefix Forward

📅 2025-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
GRPO suffers from significant computational redundancy in long-context training due to repeated encoding of shared prefixes, severely limiting scalability. To address this, we propose Shared-Prefix Forward (SPF), the first end-to-end redundancy-free training strategy for GRPO. SPF fundamentally restructures the self-attention mechanism to decouple prefix-shared and suffix-independent computation, while preserving gradient flow, caching prefix activations, and dynamically constructing inputs—all under a rigorous theoretical guarantee of exact equivalence to standard GRPO. As a plug-and-play module, SPF is fully compatible with existing architectures and enables larger group sizes and longer contexts. Experiments demonstrate substantial reductions in training cost—especially with long shared prefixes—without compromising optimization trajectory or policy performance. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract
Group Relative Policy Optimization (GRPO) enhances policy learning by computing gradients from relative comparisons among candidate outputs that share a common input prefix. Despite its effectiveness, GRPO introduces substantial computational overhead when processing long shared prefixes, which must be redundantly encoded for each group member. This inefficiency becomes a major scalability bottleneck in long-context learning scenarios. We propose Prefix Grouper, an efficient GRPO training algorithm that eliminates redundant prefix computation via a Shared-Prefix Forward strategy. In particular, by restructuring self-attention into two parts, our method enables the shared prefix to be encoded only once, while preserving full differentiability and compatibility with end-to-end training. We provide both theoretical and empirical evidence that Prefix Grouper is training-equivalent to standard GRPO: it yields identical forward outputs and backward gradients, ensuring that the optimization dynamics and final policy performance remain unchanged. Empirically, our experiments confirm that Prefix Grouper achieves consistent results while significantly reducing the computational cost of training, particularly in long-prefix scenarios. The proposed method is fully plug-and-play: it is compatible with existing GRPO-based architectures and can be seamlessly integrated into current training pipelines as a drop-in replacement, requiring no structural modifications and only minimal changes to input construction and attention computation. Prefix Grouper enables the use of larger group sizes under the same computational budget, thereby improving the scalability of GRPO to more complex tasks and larger models. Code is now available at https://github.com/johncaged/PrefixGrouper
Problem

Research questions and friction points this paper is trying to address.

Reduces redundant prefix computation in GRPO training
Improves scalability for long-context learning scenarios
Maintains training equivalence with standard GRPO
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared-Prefix Forward eliminates redundant computation
Restructures self-attention for single prefix encoding
Plug-and-play compatible with existing GRPO architectures
🔎 Similar Papers
No similar papers found.
Z
Zikang Liu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Tongtian Yue
Tongtian Yue
Institute of Automation, Chinese Academy of Sciences
Multimodal PretrainVisual-Language
Yepeng Tang
Yepeng Tang
Beijing Jiaotong University
VideoLLMVideo Understanding
L
Longteng Guo
Institute of Automation, Chinese Academy of Sciences
J
Junxian Cai
Basic Algorithm Center, Tencent
Q
Qingbin Liu
Basic Algorithm Center, Tencent
X
Xi Chen
Basic Algorithm Center, Tencent
J
Jing Liu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences