SCE: Scalable Consistency Ensembles Make Blackbox Large Language Model Generation More Reliable

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the unreliability of black-box large language model (LLM) outputs and the high computational cost of conventional ensemble methods, this paper proposes the Scalable Consistency Ensemble (SCE) framework. Methodologically, SCE introduces three key innovations: (1) SCE-CHECK—a novel semantic consistency–based black-box discriminator, the first of its kind; (2) SCE-FUSION—a hierarchical response fusion strategy that jointly leverages syntactic and semantic coherence; and (3) the Year-Old Pairwise Optimization (YOPO) technique, which reduces pairwise consistency checking complexity from *O*(*n*²) to *O*(1), dramatically enhancing scalability. Evaluated across diverse benchmarks, SCE achieves comparable or improved output reliability while reducing computational overhead by multiple orders of magnitude—marking the first work to jointly optimize for both high reliability and high efficiency in black-box LLM ensembling.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated remarkable performance, yet their diverse strengths and weaknesses prevent any single LLM from achieving dominance across all tasks. Ensembling multiple LLMs is a promising approach to generate reliable responses but conventional ensembling frameworks suffer from high computational overheads. This work introduces Scalable Consistency Ensemble (SCE), an efficient framework for ensembling LLMs by prompting consistent outputs. The SCE framework systematically evaluates and integrates outputs to produce a cohesive result through two core components: SCE-CHECK, a mechanism that gauges the consistency between response pairs via semantic equivalence; and SCE-FUSION, which adeptly merges the highest-ranked consistent responses from SCE-CHECK, to optimize collective strengths and mitigating potential weaknesses. To improve the scalability with multiple inference queries, we further propose ``{You Only Prompt Once}'' (YOPO), a novel technique that reduces the inference complexity of pairwise comparison from quadratic to constant time. We perform extensive empirical evaluations on diverse benchmark datasets to demonstrate methodName's effectiveness. Notably, the saccheckcomponent outperforms conventional baselines with enhanced performance and a significant reduction in computational overhead.
Problem

Research questions and friction points this paper is trying to address.

Ensemble multiple LLMs to enhance reliability and performance.
Reduce computational overhead in LLM ensembling frameworks.
Improve scalability with efficient pairwise comparison techniques.
Innovation

Methods, ideas, or system contributions that make the work stand out.

SCE framework integrates LLMs efficiently
SCE-CHECK ensures semantic consistency in responses
YOPO reduces inference complexity significantly
🔎 Similar Papers
No similar papers found.