๐ค AI Summary
Chinese long-text style transfer faces challenges including rhetorical complexity, cultural implicitness, and structural verbosity, making it difficult for existing methods to simultaneously ensure stylistic accuracy and content fidelity. To address this, we propose CAT-LLM, the first framework for fine-grained style modeling tailored to Chinese discourse-level texts. It introduces a plug-and-play Text Style Definition (TSD) module enabling dual-level (lexical and sentential) style analysis and dynamic expansion of style trees. We also construct the first Chinese discourse-level style transfer parallel evaluation dataset. CAT-LLM integrates machine learningโbased feature extraction, multi-level style representation, prompt engineering, and ChatGPT-assisted data construction. Experimental results demonstrate that CAT-LLM significantly outperforms baselines across five Chinese article-style transfer tasks, achieving concurrent improvements in stylistic accuracy and content preservation, while maintaining compatibility with multiple mainstream Chinese large language models.
๐ Abstract
Text style transfer is increasingly prominent in online entertainment and social media. However, existing research mainly concentrates on style transfer within individual English sentences, while ignoring the complexity of long Chinese texts, which limits the wider applicability of style transfer in digital media realm. To bridge this gap, we propose a Chinese Article-style Transfer framework (CAT-LLM), leveraging the capabilities of Large Language Models (LLMs). CAT-LLM incorporates a bespoke, pluggable Text Style Definition (TSD) module aimed at comprehensively analyzing text features in articles, prompting LLMs to efficiently transfer Chinese article-style. The TSD module integrates a series of machine learning algorithms to analyze article-style from both words and sentences levels, thereby aiding LLMs thoroughly grasp the target style without compromising the integrity of the original text. In addition, this module supports dynamic expansion of internal style trees, showcasing robust compatibility and allowing flexible optimization in subsequent research. Moreover, we select five Chinese articles with distinct styles and create five parallel datasets using ChatGPT, enhancing the models' performance evaluation accuracy and establishing a novel paradigm for evaluating subsequent research on article-style transfer. Extensive experimental results affirm that CAT-LLM outperforms current research in terms of transfer accuracy and content preservation, and has remarkable applicability to various types of LLMs.