Why Not Transform Chat Large Language Models to Non-English?

📅 2024-05-22
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Scarcity of non-English data hinders the development of non-English large language models (LLMs), while efficiently transferring English conversational LLMs to low-resource languages faces two key challenges: unsupervised transfer of advanced capabilities (e.g., multi-turn dialogue, preference alignment) and catastrophic forgetting of original knowledge. This paper proposes the first lightweight transfer paradigm for non-English conversational LLMs, innovatively integrating Translation Chain-of-Thought (Translation-CoT) task decomposition, Low-Rank Adaptation (LoRA), and self-recovering knowledge distillation (Recovery KD). Crucially, it requires only single-turn translation data to jointly enhance multi-turn dialogue capability and safety. Experiments demonstrate that our method surpasses ChatGPT on MT-Bench and achieves a higher refusal rate on harmful queries than both ChatGPT and GPT-4 on AdvBench, validating its effectiveness in capability transfer and safety preservation.

Technology Category

Application Category

📝 Abstract
The scarcity of non-English data limits the development of non-English large language models (LLMs). Transforming English-centric LLMs to non-English has been identified as an effective and resource-efficient method. Previous works start from base LLMs and perform knowledge distillation (KD) with data generated by stronger LLMs, e.g. GPT-4. Compared to base LLMs, chat LLMs are further optimized for advanced abilities, e.g. multi-turn conversation and human preference alignment, and thus more powerful in both helpfulness and safety. However, transforming a chat LLM involves two critical issues: (1) How can we effectively transfer advanced abilities without their supervised data? (2) How can we prevent the original knowledge from catastrophic forgetting during transformation? We target these issues by introducing a simple framework called TransLLM. For the first issue, TransLLM divides the transfer problem into some common sub-tasks with the translation chain-of-thought, which uses the translation as the bridge between English and non-English step-by-step. We further enhance the performance of sub-tasks with publicly available data. For the second issue, we propose a method comprising two synergistic components: low-rank adaptation for training to maintain the original LLM parameters, and recovery KD, which utilizes data generated by the chat LLM itself to recover the original knowledge from the frozen parameters. In the experiments, we transform the LLaMA-2-chat-7B to the Thai language. Our method, using only single-turn data, outperforms strong baselines and ChatGPT on multi-turn benchmark MT-bench. Furthermore, our method, without safety data, rejects more harmful queries of safety benchmark AdvBench than both ChatGPT and GPT-4.
Problem

Research questions and friction points this paper is trying to address.

Transferring chat LLMs' advanced abilities without supervised data
Preventing catastrophic forgetting of original knowledge during transformation
Enabling effective non-English adaptation of English-centric chat LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Translation chain-of-thought for ability transfer
Low-rank adaptation to maintain original parameters
Recovery knowledge distillation using self-generated data
X
Xiang Geng
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
M
Ming Zhu
Huawei Translation Services Center, Beijing, China
Jiahuan Li
Jiahuan Li
Meituan Inc.
Natural Language Processing
Zhejian Lai
Zhejian Lai
Master student of Nanjing University
自然语言处理
Wei Zou
Wei Zou
PKU、Samsung、Baidu、Didi、Ke
SpeechNLPLLMMultimodal
Shuaijie She
Shuaijie She
National Key Laboratory for Novel Software Technology, Nanjing University
ReasoningAlignmentMultilingual
J
Jiaxin Guo
Huawei Translation Services Center, Beijing, China
X
Xiaofeng Zhao
Huawei Translation Services Center, Beijing, China
Y
Yinglu Li
Huawei Translation Services Center, Beijing, China
Yuang Li
Yuang Li
2012 Lab, Huawei
SpeechNLP
C
Chang Su
Huawei Translation Services Center, Beijing, China
Yanqing Zhao
Yanqing Zhao
Huawei
AIMT
M
Min Zhang
Huawei Translation Services Center, Beijing, China
H
Hao Yang
Huawei Translation Services Center, Beijing, China
Xinglin Lyu
Xinglin Lyu
PhD Student of Software Engineering, Soochow University
Machine TranslationNatural Language Processing
J
Jiajun Chen
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models