🤖 AI Summary
Current large language models (LLMs) struggle to support high-quality scientific writing, particularly in maintaining cross-sectional conceptual coherence and enabling iterative, fine-grained revision—critical yet underaddressed requirements in academic authoring. This paper introduces a human-AI collaborative framework tailored for iterative revision of scholarly papers, operating at the paragraph level and driven by natural-language revision instructions. Our key contributions are threefold: (1) the first large-scale academic revision dataset, comprising 7,040 top-conference papers and over 140,000 real-world instruction-revision pairs; (2) an open-source, multi-scale (1.5B–14B) family of context-aware, academic revision–specific LLMs, integrating section-aware modeling and controllable generation; and (3) comprehensive evaluation—both automated and human—demonstrating substantial performance gains over comparable open-source models and approaching commercial closed-source systems, with measurable improvements in draft logic, factual accuracy, and linguistic quality.
📝 Abstract
Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited when it comes to supporting high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, such as conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision. We first introduce a comprehensive dataset of 7,040 research papers from top-tier venues annotated with over 140,000 instruction-response pairs that reflect realistic, section-level scientific revisions. Building on the dataset, we develop XtraGPT, the first suite of open-source LLMs, designed to provide context-aware, instruction-guided writing assistance, ranging from 1.5B to 14B parameters. Extensive experiments validate that XtraGPT significantly outperforms same-scale baselines and approaches the quality of proprietary systems. Both automated preference assessments and human evaluations confirm the effectiveness of our models in improving scientific drafts.