CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the frequent co-occurrence and mutual influence of linguistic errors (lexical, grammatical, and punctuation) and factual inaccuracies in Chinese professional writing—a challenge inadequately handled by traditional approaches that treat these issues in isolation. To bridge this gap, we introduce the first unified task, CLFEC (Combined Linguistic and Factual Error Correction), and construct a high-quality, multi-domain Chinese dataset spanning politics, finance, law, and medicine. Through systematic exploration of large language model–based paradigms—including prompt engineering, retrieval-augmented generation, and agent-based workflows—we demonstrate that joint in-context correction significantly outperforms decoupled methods. Moreover, when paired with an appropriate base model, the agent-driven workflow further enhances performance, offering both theoretical grounding and practical guidance for industrial-scale, fully automated proofreading systems.

Technology Category

Application Category

📝 Abstract
Chinese text correction has traditionally focused on spelling and grammar, while factual error correction is usually treated separately. However, in paragraph-level Chinese professional writing, linguistic (word/grammar/punctuation) and factual errors frequently co-occur and interact, making unified correction both necessary and challenging. This paper introduces CLFEC (Chinese Linguistic&Factual Error Correction), a new task for joint linguistic and factual correction. We construct a mixed, multi-domain Chinese professional writing dataset spanning current affairs, finance, law, and medicine. We then conduct a systematic study of LLM-based correction paradigms, from prompting to retrieval-augmented generation (RAG) and agentic workflows. The analysis reveals practical challenges, including limited generalization of specialized correction models, the need for evidence grounding for factual repair, the difficulty of mixed-error paragraphs, and over-correction on clean inputs. Results further show that handling linguistic and factual Error within the same context outperform decoupled processes, and that agentic workflows can be effective with suitable backbone models. Overall, our dataset and empirical findings provide guidance for building reliable, fully automatic proofreading systems in industrial settings.
Problem

Research questions and friction points this paper is trying to address.

Chinese text correction
linguistic error
factual error
paragraph-level writing
unified correction
Innovation

Methods, ideas, or system contributions that make the work stand out.

unified error correction
factual error correction
Chinese professional writing
retrieval-augmented generation
agentic workflows
🔎 Similar Papers
No similar papers found.
J
Jian Kai
Huazhong University of Science and Technology
Z
Zidong Zhang
WPS AI, Kingsoft Office
J
Jiwen Chen
WPS AI, Kingsoft Office
Z
Zhengxiang Wu
WPS AI, Kingsoft Office
S
Songtao Sun
WPS AI, Kingsoft Office
F
Fuyang Li
WPS AI, Kingsoft Office
Yang Cao
Yang Cao
PhD, Associate Professor, Huazhong University of Science and Technology
Wireless NetworkingQoEEdge ComputingInternet of Things
Q
Qiang Liu
WPS AI, Kingsoft Office