Belief-Guided Inference Control for Large Language Model Services via Verifiable Observations

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenge of dynamically selecting between low-cost default responses and high-quality, expensive inference paths in black-box large language model services under limited computational budgets. It introduces, for the first time, a Partially Observable Markov Decision Process (POMDP) framework tailored to this setting, proposing a lightweight belief-state construction mechanism grounded in verifiable observation channels. This approach integrates heterogeneous quality signals to estimate response reliability and learns a budget-aware adaptive inference policy. Experimental results demonstrate that the proposed method significantly outperforms existing baselines across diverse tasks, achieving notable improvements in the trade-off between output quality and computational cost, risk calibration, and long-term robustness of sequential reasoning.

📝 Abstract

In black-box large language model (LLM) services, response reliability is often only partially observable at decision time, while stronger inference pathways incur substantial computational cost, inducing a budgeted sequential decision problem: for each request, the system should decide whether the default low-cost response is sufficiently reliable or whether additional computation should be allocated to improve response quality. In this paper, we propose \textbf{Ver}ifiable \textbf{O}bservations for Risk-aware \textbf{I}nference \textbf{C}ontrol (\textsc{Veroic}), a framework for adaptive inference control in black-box LLM settings, which formulates request-time control as a \textit{partially observable Markov decision process} to capture partial observability and sequential budget coupling. It constructs a lightweight verifiable observation channel from the input-output pair by aggregating heterogeneous quality signals into a belief state over latent response reliability, which is then used by a budget-aware policy to decide whether to return the default output or trigger a higher-cost inference pathway. Experiments on diverse tasks show that \textsc{Veroic} achieves improved quality-cost trade-offs, stronger risk estimation and calibration, and more robust long-horizon inference control than competitive baselines.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Inference Control

Partial Observability

Budgeted Decision Making

Response Reliability

Innovation

Methods, ideas, or system contributions that make the work stand out.

verifiable observations

inference control

partially observable Markov decision process