CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing

📅 2026-02-27

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the frequent co-occurrence and mutual influence of linguistic errors (lexical, grammatical, and punctuation) and factual inaccuracies in Chinese professional writing—a challenge inadequately handled by traditional approaches that treat these issues in isolation. To bridge this gap, we introduce the first unified task, CLFEC (Combined Linguistic and Factual Error Correction), and construct a high-quality, multi-domain Chinese dataset spanning politics, finance, law, and medicine. Through systematic exploration of large language model–based paradigms—including prompt engineering, retrieval-augmented generation, and agent-based workflows—we demonstrate that joint in-context correction significantly outperforms decoupled methods. Moreover, when paired with an appropriate base model, the agent-driven workflow further enhances performance, offering both theoretical grounding and practical guidance for industrial-scale, fully automated proofreading systems.

Technology Category

Application Category

📝 Abstract

Chinese text correction has traditionally focused on spelling and grammar, while factual error correction is usually treated separately. However, in paragraph-level Chinese professional writing, linguistic (word/grammar/punctuation) and factual errors frequently co-occur and interact, making unified correction both necessary and challenging. This paper introduces CLFEC (Chinese Linguistic&Factual Error Correction), a new task for joint linguistic and factual correction. We construct a mixed, multi-domain Chinese professional writing dataset spanning current affairs, finance, law, and medicine. We then conduct a systematic study of LLM-based correction paradigms, from prompting to retrieval-augmented generation (RAG) and agentic workflows. The analysis reveals practical challenges, including limited generalization of specialized correction models, the need for evidence grounding for factual repair, the difficulty of mixed-error paragraphs, and over-correction on clean inputs. Results further show that handling linguistic and factual Error within the same context outperform decoupled processes, and that agentic workflows can be effective with suitable backbone models. Overall, our dataset and empirical findings provide guidance for building reliable, fully automatic proofreading systems in industrial settings.

Problem

Research questions and friction points this paper is trying to address.

Chinese text correction

linguistic error

factual error

paragraph-level writing

unified correction

Innovation

Methods, ideas, or system contributions that make the work stand out.

unified error correction

factual error correction

Chinese professional writing