🤖 AI Summary
The rapid proliferation of English technical terms in cutting-edge domains (e.g., AI, quantum computing) has outpaced multilingual standardization efforts, leading to terminological inconsistency and ambiguity across languages.
Method: This paper proposes a term-level backtranslation framework powered by large language models (GPT-4, DeepSeek, Grok), implementing a multi-path “retrieve–generate–verify–optimize” workflow (e.g., EN→ZHcn→ZHtw→EN). It integrates BLEU scoring, term-specific accuracy metrics, and path-aware semantic modeling.
Contribution/Results: We introduce the first term-level consistency verification mechanism and elevate backtranslation into an interpretable, evolvable dynamic semantic embedding paradigm. Experiments demonstrate >90% term consistency (exact + semantic match), 100% backtranslation accuracy for Portuguese, and mean BLEU scores >0.45—enabling scalable, human-in-the-loop terminology governance infrastructure.
📝 Abstract
The rapid growth of English technical terms challenges traditional expert-driven standardization, especially in fast-evolving fields like AI and quantum computing. Manual methods struggle to ensure multilingual consistency. We propose extbf{LLM-BT}, a back-translation framework powered by large language models (LLMs) to automate terminology verification and standardization via cross-lingual semantic alignment. Our contributions are: extbf{(1) Term-Level Consistency Validation:} Using English $
ightarrow$ intermediate language $
ightarrow$ English back-translation, LLM-BT achieves high term consistency across models (e.g., GPT-4, DeepSeek, Grok), with case studies showing over 90% exact or semantic matches. extbf{(2) Multi-Path Verification Workflow:} A novel ``Retrieve--Generate--Verify--Optimize'' pipeline integrates serial (e.g., EN $
ightarrow$ ZHcn $
ightarrow$ ZHtw $
ightarrow$ EN) and parallel (e.g., EN $
ightarrow$ Chinese/Portuguese $
ightarrow$ EN) BT routes. BLEU and term accuracy indicate strong cross-lingual robustness (BLEU $>$ 0.45; Portuguese accuracy 100%). extbf{(3) Back-Translation as Semantic Embedding:} BT is conceptualized as dynamic semantic embedding, revealing latent meaning trajectories. Unlike static embeddings, LLM-BT provides transparent path-based embeddings shaped by model evolution. LLM-BT transforms back-translation into an active engine for multilingual terminology standardization, enabling human--AI collaboration: machines ensure semantic fidelity, humans guide cultural interpretation. This infrastructure supports terminology governance across scientific and technological fields worldwide.