🤖 AI Summary
Existing structured text translation research primarily operates at the sentence level, failing to preserve document-level structural integrity in XML/HTML documents. To address this, we propose a formatting-aware reinforcement learning framework that jointly optimizes translation quality and structural fidelity via a structure-aware reward mechanism. We introduce two novel structural rewards: TreeSim, measuring tree-structure similarity, and Node-chrF, a node-level character n-gram F-score; additionally, we propose StrucAUC as a fine-grained, integrated evaluation metric for structural and semantic alignment. Our method builds upon a supervised fine-tuned base model and employs Group Relative Policy Optimization for end-to-end optimization. Evaluated on the SAP software documentation benchmark, our approach achieves statistically significant improvements across all six evaluation metrics—demonstrating its effectiveness in simultaneously ensuring structural preservation and semantic accuracy.
📝 Abstract
Recent works on structured text translation remain limited to the sentence level, as they struggle to effectively handle the complex document-level XML or HTML structures. To address this, we propose extbf{Format Reinforcement Learning (FormatRL)}, which employs Group Relative Policy Optimization on top of a supervised fine-tuning model to directly optimize novel structure-aware rewards: 1) TreeSim, which measures structural similarity between predicted and reference XML trees and 2) Node-chrF, which measures translation quality at the level of XML nodes. Additionally, we apply StrucAUC, a fine-grained metric distinguishing between minor errors and major structural failures. Experiments on the SAP software-documentation benchmark demonstrate improvements across six metrics and an analysis further shows how different reward functions contribute to improvements in both structural and translation quality.