CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer

📅 2024-01-11

🏛️ arXiv.org

📈 Citations: 13

✨ Influential: 1

career value

176K/year

🤖 AI Summary

Chinese long-text style transfer faces challenges including rhetorical complexity, cultural implicitness, and structural verbosity, making it difficult for existing methods to simultaneously ensure stylistic accuracy and content fidelity. To address this, we propose CAT-LLM, the first framework for fine-grained style modeling tailored to Chinese discourse-level texts. It introduces a plug-and-play Text Style Definition (TSD) module enabling dual-level (lexical and sentential) style analysis and dynamic expansion of style trees. We also construct the first Chinese discourse-level style transfer parallel evaluation dataset. CAT-LLM integrates machine learning–based feature extraction, multi-level style representation, prompt engineering, and ChatGPT-assisted data construction. Experimental results demonstrate that CAT-LLM significantly outperforms baselines across five Chinese article-style transfer tasks, achieving concurrent improvements in stylistic accuracy and content preservation, while maintaining compatibility with multiple mainstream Chinese large language models.

Technology Category

Application Category

📝 Abstract

Text style transfer is increasingly prominent in online entertainment and social media. However, existing research mainly concentrates on style transfer within individual English sentences, while ignoring the complexity of long Chinese texts, which limits the wider applicability of style transfer in digital media realm. To bridge this gap, we propose a Chinese Article-style Transfer framework (CAT-LLM), leveraging the capabilities of Large Language Models (LLMs). CAT-LLM incorporates a bespoke, pluggable Text Style Definition (TSD) module aimed at comprehensively analyzing text features in articles, prompting LLMs to efficiently transfer Chinese article-style. The TSD module integrates a series of machine learning algorithms to analyze article-style from both words and sentences levels, thereby aiding LLMs thoroughly grasp the target style without compromising the integrity of the original text. In addition, this module supports dynamic expansion of internal style trees, showcasing robust compatibility and allowing flexible optimization in subsequent research. Moreover, we select five Chinese articles with distinct styles and create five parallel datasets using ChatGPT, enhancing the models' performance evaluation accuracy and establishing a novel paradigm for evaluating subsequent research on article-style transfer. Extensive experimental results affirm that CAT-LLM outperforms current research in terms of transfer accuracy and content preservation, and has remarkable applicability to various types of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Handles complexity of Chinese long texts for style transfer

Integrates Text Style Definition module for better style adaptation

Creates parallel datasets for robust model evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pluggable Text Style Definition module for Chinese styles

Dynamic expansion of internal style trees

Parallel datasets for robust model evaluation

🔎 Similar Papers

No similar papers found.