🤖 AI Summary
As millions of specialized large language models (LLMs) coexist in future AI ecosystems, efficient, scalable, and dynamic selection of domain-expert models—beyond static metadata—is a critical challenge.
Method: This paper proposes an agent-centric information access framework that models LLMs as dynamic knowledge agents. It enables online expert model ranking and selection via real-time inference of domain expertise, integrating retrieval-augmented generation (RAG), hierarchical model clustering, dynamic expert ranking, and multi-agent response aggregation—balancing query cost, robustness, and scalability.
Contribution/Results: The work introduces the novel “agent-level retrieval” paradigm, enabling coordinated scheduling across millions of heterogeneous LLMs. Experiments across diverse domains demonstrate significant improvements in answer accuracy and resource utilization efficiency. The framework provides a scalable infrastructure for evaluation and orchestration in large-scale specialized model ecosystems.
📝 Abstract
As large language models (LLMs) become more specialized, we envision a future where millions of expert LLMs exist, each trained on proprietary data and excelling in specific domains. In such a system, answering a query requires selecting a small subset of relevant models, querying them efficiently, and synthesizing their responses. This paper introduces a framework for agent-centric information access, where LLMs function as knowledge agents that are dynamically ranked and queried based on their demonstrated expertise. Unlike traditional document retrieval, this approach requires inferring expertise on the fly, rather than relying on static metadata or predefined model descriptions. This shift introduces several challenges, including efficient expert selection, cost-effective querying, response aggregation across multiple models, and robustness against adversarial manipulation. To address these issues, we propose a scalable evaluation framework that leverages retrieval-augmented generation and clustering techniques to construct and assess thousands of specialized models, with the potential to scale toward millions.