BoundRL: Efficient Structured Text Segmentation through Reinforced Boundary Generation

📅 2025-10-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Semantic segmentation of complex structured text—containing tables, code blocks, placeholders, and other non-linguistic elements—remains challenging for conventional sentence- or paragraph-level approaches, which fail to model such heterogeneous content. Method: We propose a token-level segmentation framework based on Reinforcement Learning with Verifiable Rewards (RLVR). Instead of generating full segments, the model emits only paragraph-start tokens; original-text localization then reconstructs segment content, mitigating hallucination by avoiding explicit token generation. A reward function jointly optimizes reconstruction fidelity and semantic alignment, while sequence perturbation generates intermediate candidate solutions to alleviate entropy collapse. Results: Our 1.7B-parameter model outperforms large language models under few-shot prompting on LLM prompt segmentation tasks, achieving superior accuracy, generalization across domains, and inference efficiency compared to supervised fine-tuning baselines.

Technology Category

Application Category

📝 Abstract
As structured texts become increasingly complex across diverse domains -- from technical reports to generative AI prompts -- the need for text segmentation into semantically meaningful components becomes critical. Such texts often contain elements beyond plain language, including tables, code snippets, and placeholders, which conventional sentence- or paragraph-level segmentation methods cannot handle effectively. To address this challenge, we propose BoundRL, a novel and efficient approach that jointly performs token-level text segmentation and label prediction for long structured texts. Instead of generating complete contents for each segment, it generates only a sequence of starting tokens and reconstructs the complete contents by locating these tokens within the original texts, thereby reducing inference costs by orders of magnitude and minimizing hallucination. To adapt the model for the output format, BoundRL~performs reinforcement learning with verifiable rewards (RLVR) with a specifically designed reward that jointly optimizes document reconstruction fidelity and semantic alignment. To mitigate entropy collapse, it further constructs intermediate candidates by systematically perturbing a fraction of generated sequences of segments to create stepping stones toward higher-quality solutions. To demonstrate BoundRL's effectiveness on particularly challenging structured texts, we focus evaluation on complex prompts used for LLM applications. Experiments show that BoundRL enables small language models (1.7B parameters) to outperform few-shot prompting of much larger models. Moreover, RLVR with our designed reward yields significant improvements over supervised fine-tuning, and incorporating intermediate candidates further improves both performance and generalization.
Problem

Research questions and friction points this paper is trying to address.

Segmenting complex structured texts with diverse elements like tables and code snippets
Reducing inference costs and hallucinations in text segmentation tasks
Improving semantic alignment and reconstruction fidelity for structured text segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates only starting tokens for segments
Uses reinforcement learning with verifiable rewards
Constructs intermediate candidates through systematic perturbation
🔎 Similar Papers
No similar papers found.
H
Haoyuan Li
University of North Carolina at Chapel Hill
Zhengyuan Shen
Zhengyuan Shen
Amazon
Machine LearningLLMSimulation
Sullam Jeoung
Sullam Jeoung
University of Illinois Urbana Champaign
Yueyan Chen
Yueyan Chen
Amazon
J
Jiayu Li
Amazon Web Services
Q
Qi Zhu
Amazon Web Services
S
Shuai Wang
Amazon Web Services
V
Vassilis Ioannidis
Amazon Web Services
Huzefa Rangwala
Huzefa Rangwala
Professor of Computer Science, George Mason/ ML Scientist, Amazon
LLM Post-trainingReinforcement LearningAutoMLGraphMLKernel Optimization