Steering LLMs toward Korean Local Speech: Iterative Refinement Framework for Faithful Dialect Translation

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two critical challenges in Korean Standard Language-to-dialect translation using large language models (LLMs): the “dialect gap” and distortion in n-gram-based evaluation—where verbatim source copying is erroneously scored as high-quality translation. To tackle these, we propose DIA-REFINE, an iterative refinement framework integrating zero-shot and in-context learning, coupled with an external dialect classifier to establish a translation–verification–feedback loop. We further introduce two semantic-aware metrics: Dialect Fidelity Score (DFS) and Target Dialect Ratio (TDR), designed to detect spurious success cases. Experiments across multiple benchmarks demonstrate substantial improvements in dialectal naturalness and robust discrimination between illusory high scores and genuine dialect conversion. Moreover, our analysis reveals heterogeneous LLM responsiveness to iterative optimization. DIA-REFINE thus establishes an interpretable, verifiable paradigm for low-resource dialect translation.

Technology Category

Application Category

📝 Abstract
Standard-to-dialect machine translation remains challenging due to a persistent dialect gap in large language models and evaluation distortions inherent in n-gram metrics, which favor source copying over authentic dialect translation. In this paper, we propose the dialect refinement (DIA-REFINE) framework, which guides LLMs toward faithful target dialect outputs through an iterative loop of translation, verification, and feedback using external dialect classifiers. To address the limitations of n-gram-based metrics, we introduce the dialect fidelity score (DFS) to quantify linguistic shift and the target dialect ratio (TDR) to measure the success of dialect translation. Experiments on Korean dialects across zero-shot and in-context learning baselines demonstrate that DIA-REFINE consistently enhances dialect fidelity. The proposed metrics distinguish between False Success cases, where high n-gram scores obscure failures in dialectal translation, and True Attempt cases, where genuine attempts at dialectal translation yield low n-gram scores. We also observed that models exhibit varying degrees of responsiveness to the framework, and that integrating in-context examples further improves the translation of dialectal expressions. Our work establishes a robust framework for goal-directed, inclusive dialect translation, providing both rigorous evaluation and critical insights into model performance.
Problem

Research questions and friction points this paper is trying to address.

Addressing dialect gaps in LLMs for Korean speech translation
Overcoming evaluation distortions from n-gram metrics in dialect translation
Developing faithful dialect translation through iterative refinement framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Iterative refinement framework using dialect classifiers
Dialect fidelity score quantifying linguistic shifts
Target dialect ratio measuring translation success
🔎 Similar Papers
No similar papers found.
K
Keunhyeung Park
Chung-Ang University
S
Seunguk Yu
Chung-Ang University
Youngbin Kim
Youngbin Kim
Senior Researcher, ETRI (Electronics and Telecommunications Research Institute)