Investigating Numerical Translation with Large Language Models

📅 2025-01-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically exposes critical reliability deficiencies in large language models (LLMs) when translating large-unit numerals (e.g., million, billion, yi/100 million) between Chinese and English. Addressing a gap in prior work, we introduce the first bilingual numerical translation benchmark covering ten realistic business scenarios, accompanied by multi-dimensional error analysis and model behavioral diagnostics. Experiments reveal alarmingly high mis-translation rates—up to 20%—among mainstream open-source LLMs (e.g., Llama3.1-8B). We propose three novel, targeted mitigation strategies: explicit unit annotation, stepwise numeric parsing, and context-constrained fine-tuning—each demonstrably reducing error rates. This work fills a fundamental gap in robustness research on numeral translation and delivers a reproducible evaluation framework alongside practical, implementable optimization pathways for high-fidelity cross-lingual numerical conversion.

Technology Category

Application Category

📝 Abstract
The inaccurate translation of numbers can lead to significant security issues, ranging from financial setbacks to medical inaccuracies. While large language models (LLMs) have made significant advancements in machine translation, their capacity for translating numbers has not been thoroughly explored. This study focuses on evaluating the reliability of LLM-based machine translation systems when handling numerical data. In order to systematically test the numerical translation capabilities of currently open source LLMs, we have constructed a numerical translation dataset between Chinese and English based on real business data, encompassing ten types of numerical translation. Experiments on the dataset indicate that errors in numerical translation are a common issue, with most open-source LLMs faltering when faced with our test scenarios. Especially when it comes to numerical types involving large units like ``million", ``billion", and"yi", even the latest llama3.1 8b model can have error rates as high as 20%. Finally, we introduce three potential strategies to mitigate the numerical mistranslations for large units.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Numeric Translation Accuracy
Error Prevention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale Language Model
Numerical Translation Accuracy
Error Reduction Techniques
🔎 Similar Papers
No similar papers found.
W
Wei Tang
Huawei Translation Services Center, China
Jiawei Yu
Jiawei Yu
Xiamen University
SpeechNatural Language Processing
Yuang Li
Yuang Li
2012 Lab, Huawei
SpeechNLP
Yanqing Zhao
Yanqing Zhao
Huawei
AIMT
Weidong Zhang
Weidong Zhang
Samsung Research America
Computer VisionImage Processing
W
Wei Feng
Huawei Translation Services Center, China
M
Min Zhang
Huawei Translation Services Center, China
H
Hao Yang
Huawei Translation Services Center, China