LaneRoPE: Positional Encoding for Collaborative Parallel Reasoning and Generation

πŸ“… 2026-05-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Traditional parallel generation methods sample sequences independently, lacking intermediate information sharing and coordination, which limits both inference efficiency and accuracy. This work proposes the first explicit cross-sequence collaboration mechanism, introducing inter-sequence attention masks and an extended Rotary Position Embedding (RoPE) to model relative positions both within and across sequences, with negligible additional inference overhead. The approach significantly improves accuracy on mathematical reasoning tasks under constrained generation lengths, demonstrating the effectiveness and practicality of collaborative parallel generation.
πŸ“ Abstract
Parallel LLM test-time scaling techniques (e.g., best-of-$N$) require drawing $N>1$ sequences conditioned on the same input prompt. These methods boost accuracy while exploiting the computational efficiency of batching $N$ generations. However, each sequence in the batch is traditionally generated independently and hence does not reuse intermediate generations, computations, or observations from other sequences. In this paper, we propose LaneRoPE to enable coordination and collaboration among $N>1$ sequences at generation time. LaneRoPE involves two key ideas: (a) an inter-sequence attention mask to make sampling of sequences dependent on one another; and (b) a RoPE extension that injects positional information that captures relative positions between tokens, both within and outside a particular sequence. We evaluate our approach on mathematical reasoning tasks and find promising results: LaneRoPE enables collaboration among sequences, yielding additional accuracy gains under limited generated sequence length. Importantly, since LaneRoPE enables coordination with minimal changes to the underlying LLM architecture and introduces a negligible overhead at inference time, it is appealing to rapidly incorporate parallel reasoning into existing LLM inference pipelines.
Problem

Research questions and friction points this paper is trying to address.

parallel reasoning
test-time scaling
sequence collaboration
positional encoding
LLM inference
Innovation

Methods, ideas, or system contributions that make the work stand out.

LaneRoPE
parallel reasoning
inter-sequence attention
RoPE extension
test-time scaling
πŸ”Ž Similar Papers