SelectLLM: Query-Aware Efficient Selection Algorithm for Large Language Models

πŸ“… 2024-08-16
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the limited generalization capability and suboptimal performance of single large language models (LLMs) on complex, domain-specific tasks, this paper proposes a query-aware lightweight LLM selection algorithm that dynamically selects an optimal subset from a large pool of LLMs for collaborative inference. Our key contributions are threefold: (1) a novel adaptive selection mechanism jointly driven by multi-label classification and confidence estimation; (2) derivation of a theoretical Oracle upper bound coupled with linguistic attribution analysis to enhance interpretability; and (3) integration of inference latency optimization strategies. Experiments on GSM8K and MMLU demonstrate 13% and 70% latency reduction, respectively, while matching the accuracy of the best-performing monolithic model of comparable scale and significantly outperforming existing ensemble baselines.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) have seen widespread adoption due to their remarkable performance across various applications, driving the accelerated development of a large number of diverse LLMs. However, these individual LLMs show limitations in generalization and performance on complex tasks due to inherent training biases, model size constraints, and the quality or diversity of pre-training datasets. A promising direction is to efficiently harness the diverse capabilities of LLMs to overcome these individual limitations. To address these limitations, we introduce a novel LLM selection algorithm called SelectLLM, which efficiently directs input queries to the most suitable subset of LLMs from a large pool, ensuring that the selected models collectively provide accurate responses. SelectLLM employs a multi-label classifier and policy based on the classifier's predictions and confidence scores in selecting an optimal, query-aware, and lightweight subset of LLMs. Our findings indicate that the proposed model outperforms existing ensemble-based baselines and achieves competitive performance with similarly sized top-performing LLMs while maintaining efficiency. Specifically, it achieves a huge reduction in inference latency on two challenging reasoning benchmarks: 13% on GSM8K and 70% on MMLU, compared to the top-performing baselines. Also, we establish a theoretical upper bound by an oracle with LLMs and explore in-depth linguistic analysis to understand the performance gap between Oracle and SelectLLM.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Model Combination
Problem-Solving Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

SelectLLM
Multi-feature Classifier
Resource Optimization
πŸ”Ž Similar Papers