HAPS: Hierarchical LLM Routing with Joint Architecture and Parameter Search

📅 2026-01-09
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing large language model (LLM) routing approaches, which typically select only the model architecture while neglecting parameter configurations that are crucial for task performance. To overcome this, the authors propose HAPS, a novel framework that jointly optimizes both architecture and parameters in LLM routing for the first time. HAPS employs a hierarchical routing mechanism: a high-level router selects the optimal architecture, while a low-level router dynamically searches for the best parameter configuration via a parameter generation network, enabling synergistic co-optimization. The framework features parameter sharing, a reward-augmented objective function, and end-to-end trainability. Experimental results on two mainstream benchmarks demonstrate that HAPS significantly outperforms strong existing baselines, confirming its effectiveness and superiority.

Technology Category

Application Category

📝 Abstract
Large language model (LLM) routing aims to exploit the specialized strengths of different LLMs for diverse tasks. However, existing approaches typically focus on selecting LLM architectures while overlooking parameter settings, which are critical for task performance. In this paper, we introduce HAPS, a hierarchical LLM routing framework that jointly searches over model architectures and parameters. Specifically, we use a high-level router to select among candidate LLM architectures, and then search for the optimal parameters for the selected architectures based on a low-level router. We design a parameter generation network to share parameters between the two routers to mutually enhance their capabilities. In the training process, we design a reward-augmented objective to effectively optimize our framework. Experiments on two commonly used benchmarks show that HAPS consistently outperforms strong routing baselines. We have released our code at https://github.com/zihangtian/HAPS.
Problem

Research questions and friction points this paper is trying to address.

LLM routing
model architecture
parameter optimization
task performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM routing
joint architecture and parameter search
hierarchical routing
parameter generation network
reward-augmented optimization
🔎 Similar Papers
No similar papers found.