CAT-LLM: Prompting Large Language Models with Text Style Definition for Chinese Article-style Transfer

๐Ÿ“… 2024-01-11
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 13
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Chinese long-text style transfer faces challenges including rhetorical complexity, cultural implicitness, and structural verbosity, making it difficult for existing methods to simultaneously ensure stylistic accuracy and content fidelity. To address this, we propose CAT-LLM, the first framework for fine-grained style modeling tailored to Chinese discourse-level texts. It introduces a plug-and-play Text Style Definition (TSD) module enabling dual-level (lexical and sentential) style analysis and dynamic expansion of style trees. We also construct the first Chinese discourse-level style transfer parallel evaluation dataset. CAT-LLM integrates machine learningโ€“based feature extraction, multi-level style representation, prompt engineering, and ChatGPT-assisted data construction. Experimental results demonstrate that CAT-LLM significantly outperforms baselines across five Chinese article-style transfer tasks, achieving concurrent improvements in stylistic accuracy and content preservation, while maintaining compatibility with multiple mainstream Chinese large language models.

Technology Category

Application Category

๐Ÿ“ Abstract
Text style transfer is increasingly prominent in online entertainment and social media. However, existing research mainly concentrates on style transfer within individual English sentences, while ignoring the complexity of long Chinese texts, which limits the wider applicability of style transfer in digital media realm. To bridge this gap, we propose a Chinese Article-style Transfer framework (CAT-LLM), leveraging the capabilities of Large Language Models (LLMs). CAT-LLM incorporates a bespoke, pluggable Text Style Definition (TSD) module aimed at comprehensively analyzing text features in articles, prompting LLMs to efficiently transfer Chinese article-style. The TSD module integrates a series of machine learning algorithms to analyze article-style from both words and sentences levels, thereby aiding LLMs thoroughly grasp the target style without compromising the integrity of the original text. In addition, this module supports dynamic expansion of internal style trees, showcasing robust compatibility and allowing flexible optimization in subsequent research. Moreover, we select five Chinese articles with distinct styles and create five parallel datasets using ChatGPT, enhancing the models' performance evaluation accuracy and establishing a novel paradigm for evaluating subsequent research on article-style transfer. Extensive experimental results affirm that CAT-LLM outperforms current research in terms of transfer accuracy and content preservation, and has remarkable applicability to various types of LLMs.
Problem

Research questions and friction points this paper is trying to address.

Handles complexity of Chinese long texts for style transfer
Integrates Text Style Definition module for better style adaptation
Creates parallel datasets for robust model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pluggable Text Style Definition module for Chinese styles
Dynamic expansion of internal style trees
Parallel datasets for robust model evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
Zhen Tao
Zhen Tao
Technical University of Munich
Usable Privacy and SecuritySoftware Engineering
D
Dinghao Xi
School of Information, Renmin University of China
Zhiyu Li
Zhiyu Li
Tianjin University
Robust controlattitude control
L
Liumin Tang
School of Information, Renmin University of China
W
Wei Xu
School of Information, Renmin University of China