XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current large language models (LLMs) struggle to support high-quality scientific writing, particularly in maintaining cross-sectional conceptual coherence and enabling iterative, fine-grained revision—critical yet underaddressed requirements in academic authoring. This paper introduces a human-AI collaborative framework tailored for iterative revision of scholarly papers, operating at the paragraph level and driven by natural-language revision instructions. Our key contributions are threefold: (1) the first large-scale academic revision dataset, comprising 7,040 top-conference papers and over 140,000 real-world instruction-revision pairs; (2) an open-source, multi-scale (1.5B–14B) family of context-aware, academic revision–specific LLMs, integrating section-aware modeling and controllable generation; and (3) comprehensive evaluation—both automated and human—demonstrating substantial performance gains over comparable open-source models and approaching commercial closed-source systems, with measurable improvements in draft logic, factual accuracy, and linguistic quality.

Technology Category

Application Category

📝 Abstract
Despite the growing adoption of large language models (LLMs) in academic workflows, their capabilities remain limited when it comes to supporting high-quality scientific writing. Most existing systems are designed for general-purpose scientific text generation and fail to meet the sophisticated demands of research communication beyond surface-level polishing, such as conceptual coherence across sections. Furthermore, academic writing is inherently iterative and revision-driven, a process not well supported by direct prompting-based paradigms. To address these scenarios, we propose a human-AI collaboration framework for academic paper revision. We first introduce a comprehensive dataset of 7,040 research papers from top-tier venues annotated with over 140,000 instruction-response pairs that reflect realistic, section-level scientific revisions. Building on the dataset, we develop XtraGPT, the first suite of open-source LLMs, designed to provide context-aware, instruction-guided writing assistance, ranging from 1.5B to 14B parameters. Extensive experiments validate that XtraGPT significantly outperforms same-scale baselines and approaches the quality of proprietary systems. Both automated preference assessments and human evaluations confirm the effectiveness of our models in improving scientific drafts.
Problem

Research questions and friction points this paper is trying to address.

LLMs lack support for high-quality scientific writing
Existing systems fail to ensure conceptual coherence in papers
Current AI tools poorly support iterative academic revision processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Human-AI collaboration framework for academic revision
Dataset with 140,000 annotated instruction-response pairs
Open-source LLMs with context-aware writing assistance
🔎 Similar Papers
No similar papers found.
N
Nuo Chen
National University of Singapore
A
Andre Lin HuiKai
National University of Singapore
Jiaying Wu
Jiaying Wu
National University of Singapore
Natural Language ProcessingData MiningMis/DisinformationSocial Computing
Junyi Hou
Junyi Hou
National University of Singapore
Large Language ModelFederated Learning
Z
Zining Zhang
National University of Singapore
Q
Qian Wang
National University of Singapore
X
Xidong Wang
The Chinese University of Hong Kong, Shenzhen
B
Bingsheng He
National University of Singapore