๐ค AI Summary
This work addresses the tendency of large language models to produce fragmented and semantically divergent edits when revising human arguments, lacking the self-contained nature and semantic fidelity characteristic of human editing. To overcome this limitation, the authors propose a reinforcement learning approach that trains models via group relative policy optimization to generate sentence-level, independently adoptable, human-like editing suggestions. The method employs a multi-component reward function integrating semantic similarity, fluency, stylistic consistency, and argumentative appropriateness. Experimental results demonstrate that the proposed approach significantly outperforms existing baselines in both automatic and human evaluations, achieving argumentative appropriateness comparable to full rewrites after multiple editing rounds and substantially improving the human-likeness of the edited outputs.
๐ Abstract
Editing human-written text has become a standard use case of large language models (LLMs), for example, to make one's arguments more appropriate for a discussion. Comparing human to LLM-generated edits, however, we observe a mismatch in editing strategies: While LLMs often perform multiple scattered edits and tend to change meaning notably, humans rather encapsulate dependent changes in self-contained, meaning-preserving edits. In this paper, we present a reinforcement learning approach that teaches LLMs human-like editing to improve the appropriateness of arguments. Our approach produces self-contained sentence-level edit suggestions that can be accepted or rejected independently. We train the approach using group relative policy optimization with a multi-component reward function that jointly optimizes edit-level semantic similarity, fluency, and pattern conformity as well as argument-level appropriateness. In automatic and human evaluation, it outperforms competitive baselines and the state of the art in human-like editing, with multi-round editing achieving appropriateness close to full rewriting.