Dynamic Mix Precision Routing for Efficient Multi-step LLM Interaction

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of balancing efficiency and performance in multi-step interactive reasoning with large language models (LLMs) for long-horizon decision-making tasks. The authors propose the first dynamic mixed-precision routing framework, which adaptively selects between high- and low-precision quantized models at each reasoning step based on its sensitivity to precision. Sensitivity is identified via a KL divergence–based supervised learning approach, and the routing policy is optimized using Group-Relative Policy Optimization (GRPO). Evaluated on the ALFWorld benchmark, the method significantly outperforms both single-precision baselines and heuristic routing strategies, achieving a superior trade-off between inference cost and task accuracy.

Technology Category

Application Category

📝 Abstract
Large language models (LLM) achieve strong performance in long-horizon decision-making tasks through multi-step interaction and reasoning at test time. While practitioners commonly believe a higher task success rate necessitates the use of a larger and stronger LLM model, multi-step interaction with a large LLM incurs prohibitive inference cost. To address this problem, we explore the use of low-precision quantized LLM in the long-horizon decision-making process. Based on the observation of diverse sensitivities among interaction steps, we propose a dynamic mix-precision routing framework that adaptively selects between high-precision and low-precision LLMs at each decision step. The router is trained via a two-stage pipeline, consisting of KL-divergence-based supervised learning that identifies precision-sensitive steps, followed by Group-Relative Policy Optimization (GRPO) to further improve task success rates. Experiments on ALFWorld demonstrate that our approach achieves a great improvement on accuracy-cost trade-off over single-precision baselines and heuristic routing methods.
Problem

Research questions and friction points this paper is trying to address.

large language models
multi-step interaction
inference cost
long-horizon decision-making
model precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

dynamic mix-precision routing
quantized LLM
multi-step interaction
KL-divergence-based supervised learning
Group-Relative Policy Optimization
🔎 Similar Papers
No similar papers found.