RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the proliferation of LLM routers and the lack of standardized evaluation protocols, this paper introduces the first comprehensive benchmarking platform for LLM routers. Methodologically, we construct a hierarchical, multi-domain test suite with fine-grained difficulty stratification; design a multi-dimensional evaluation framework covering accuracy, cost-efficiency, robustness, and other key criteria; and implement an automated evaluation pipeline with dynamic leaderboard updates. Our core contributions are threefold: (1) the first standardized evaluation framework for LLM routers; (2) the release of the inaugural open-source LLM router leaderboard; and (3) a principled data construction methodology coupled with an end-to-end automated benchmarking pipeline. The platform is publicly available and will be fully open-sourced, providing the research community with a reproducible, extensible infrastructure to advance the rigorous, standardized development of LLM routing technologies.

Technology Category

Application Category

📝 Abstract
Today's LLM ecosystem comprises a wide spectrum of models that differ in size, capability, and cost. No single model is optimal for all scenarios; hence, LLM routers have become essential for selecting the most appropriate model under varying circumstances. However, the rapid emergence of various routers makes choosing the right one increasingly challenging. To address this problem, we need a comprehensive router comparison and a standardized leaderboard, similar to those available for models. In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. RouterArena has (1) a principally constructed dataset with broad knowledge domain coverage, (2) distinguishable difficulty levels for each domain, (3) an extensive list of evaluation metrics, and (4) an automated framework for leaderboard updates. Leveraging our framework, we have produced the initial leaderboard with detailed metrics comparison as shown in Figure 1. We will make our platform open to the public soon.
Problem

Research questions and friction points this paper is trying to address.

Comprehensive comparison of diverse LLM routers
Standardized evaluation framework for router selection
Automated leaderboard for router performance metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open platform for comprehensive LLM router comparison
Dataset with broad domain coverage and difficulty levels
Automated framework for leaderboard updates and metrics
🔎 Similar Papers
No similar papers found.
Y
Yifan Lu
Rice University
R
Rixin Liu
Rice University
Jiayi Yuan
Jiayi Yuan
Rice University
Machine LearningLarge Language Models
X
Xingqi Cui
Rice University
S
Shenrun Zhang
Rice University
H
Hongyi Liu
Rice University
Jiarong Xing
Jiarong Xing
UC Berkeley; Rice University
SystemsNetworkingSecurity