Enabling Fine-Grained Operating Points for Black-Box LLMs

📅 2025-10-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Black-box large language models (LLMs) face limitations in decision-making tasks requiring precise metric constraints (e.g., accuracy ≥ 95%), due to coarse-grained, discretized, and rounded probability outputs that hinder fine-grained operating point control. We observe that while their outputs are semantically coherent, they suffer from insufficient numerical resolution. To address this, we propose a zero-overhead, inference-free fine-grained operating point enhancement method—integrating prompt engineering, uncertainty estimation, and confidence-guided calibration—to correct output bias and expand the usable threshold space. Extensive experiments across 11 datasets and three major black-box LLM families demonstrate that our approach significantly increases both the number and diversity of achievable operating points. Crucially, it enables more accurate and robust metric-controllable decision-making without performance degradation, outperforming or matching state-of-the-art baselines.

Technology Category

Application Category

📝 Abstract
Black-box Large Language Models (LLMs) provide practical and accessible alternatives to other machine learning methods, as they require minimal labeled data and machine learning expertise to develop solutions for various decision making problems. However, for applications that need operating with constraints on specific metrics (e.g., precision $geq$ 95%), decision making with black-box LLMs remains unfavorable, due to their low numerical output cardinalities. This results in limited control over their operating points, preventing fine-grained adjustment of their decision making behavior. In this paper, we study using black-box LLMs as classifiers, focusing on efficiently improving their operational granularity without performance loss. Specifically, we first investigate the reasons behind their low-cardinality numerical outputs and show that they are biased towards generating rounded but informative verbalized probabilities. Then, we experiment with standard prompt engineering, uncertainty estimation and confidence elicitation techniques, and observe that they do not effectively improve operational granularity without sacrificing performance or increasing inference cost. Finally, we propose efficient approaches to significantly increase the number and diversity of available operating points. Our proposed approaches provide finer-grained operating points and achieve comparable to or better performance than the benchmark methods across 11 datasets and 3 LLMs.
Problem

Research questions and friction points this paper is trying to address.

Increasing operating point granularity for black-box LLM classifiers
Overcoming low cardinality limitations in LLM numerical outputs
Enabling fine-grained adjustment of decision-making behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

Elicits informative verbalized probabilities from black-box LLMs
Increases number and diversity of available operating points
Achieves fine-grained control without performance loss
🔎 Similar Papers
No similar papers found.