RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

πŸ“… 2026-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

214K/year
πŸ€– AI Summary
This work addresses the high computational cost of large language models in machine translation and the challenge of accurately estimating their quality gains under existing sample routing strategies in hybrid deployment settings. The authors formulate routing as a budget-constrained resource allocation problem and propose using an embedded marginal gain predictor as the optimal routing signal, eliminating the need for external models or hypothetical decoding. Their approach constructs a lightweight gain predictor based on prompt representations from a small model and incorporates a robust routing strategy with a safeguard mechanism. Experimental results demonstrate that the method significantly outperforms heuristic and quality estimation baselines across multiple benchmarks, achieving a superior quality–cost Pareto frontier while effectively mitigating the risk of performance regression.

Technology Category

Application Category

πŸ“ Abstract
Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.
Problem

Research questions and friction points this paper is trying to address.

LLM-based Machine Translation
Hybrid Deployment
Sample Routing
Budget Allocation
Marginal Gain
Innovation

Methods, ideas, or system contributions that make the work stand out.

sample routing
marginal gain
hybrid LLM translation
budgeted deployment
in-model router
πŸ”Ž Similar Papers
No similar papers found.