🤖 AI Summary
This work addresses a critical limitation in existing large language model (LLM) routing approaches, which typically select only the model architecture while neglecting parameter configurations that are crucial for task performance. To overcome this, the authors propose HAPS, a novel framework that jointly optimizes both architecture and parameters in LLM routing for the first time. HAPS employs a hierarchical routing mechanism: a high-level router selects the optimal architecture, while a low-level router dynamically searches for the best parameter configuration via a parameter generation network, enabling synergistic co-optimization. The framework features parameter sharing, a reward-augmented objective function, and end-to-end trainability. Experimental results on two mainstream benchmarks demonstrate that HAPS significantly outperforms strong existing baselines, confirming its effectiveness and superiority.
📝 Abstract
Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures while overlooking parameter settings, which are critical for task performance. In this paper, we introduce HAPS, a hierarchical LLM routing framework that jointly searches over model architectures and parameters. Specifically, we use a high-level router to select among candidate LLM architectures, and then search for the optimal parameters for the selected architectures based on a low-level router. We design a parameter generation network to share parameters between the two routers to mutually enhance their capabilities. In the training process, we design a reward-augmented objective to effectively optimize our framework. Experiments on two commonly used benchmarks show that HAPS consistently outperforms strong routing baselines. We have released our code at https://github.com/zihangtian/HAPS.