🤖 AI Summary
To address the proliferation of LLM routers and the lack of standardized evaluation protocols, this paper introduces the first comprehensive benchmarking platform for LLM routers. Methodologically, we construct a hierarchical, multi-domain test suite with fine-grained difficulty stratification; design a multi-dimensional evaluation framework covering accuracy, cost-efficiency, robustness, and other key criteria; and implement an automated evaluation pipeline with dynamic leaderboard updates. Our core contributions are threefold: (1) the first standardized evaluation framework for LLM routers; (2) the release of the inaugural open-source LLM router leaderboard; and (3) a principled data construction methodology coupled with an end-to-end automated benchmarking pipeline. The platform is publicly available and will be fully open-sourced, providing the research community with a reproducible, extensible infrastructure to advance the rigorous, standardized development of LLM routing technologies.
📝 Abstract
Today's LLM ecosystem comprises a wide spectrum of models that differ in size, capability, and cost. No single model is optimal for all scenarios; hence, LLM routers have become essential for selecting the most appropriate model under varying circumstances. However, the rapid emergence of various routers makes choosing the right one increasingly challenging. To address this problem, we need a comprehensive router comparison and a standardized leaderboard, similar to those available for models. In this work, we introduce RouterArena, the first open platform enabling comprehensive comparison of LLM routers. RouterArena has (1) a principally constructed dataset with broad knowledge domain coverage, (2) distinguishable difficulty levels for each domain, (3) an extensive list of evaluation metrics, and (4) an automated framework for leaderboard updates. Leveraging our framework, we have produced the initial leaderboard with detailed metrics comparison as shown in Figure 1. We will make our platform open to the public soon.