Hive: A Multi-Agent Infrastructure for Algorithm- and Task-Level Scaling

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the inefficiencies in existing large language model (LLM) agent systems, which suffer from redundant reasoning and suboptimal resource scheduling when scaling at both the algorithmic and task levels. To overcome these limitations, the authors propose Hive, a multi-agent infrastructure that enables joint algorithmic and task-level scaling for the first time. Hive employs a declarative frontend to specify agent behaviors and introduces a Logits Cache mechanism in the backend to eliminate redundant sampling. Furthermore, it incorporates an agent-aware scheduler to optimize resource allocation. Experimental results demonstrate that Logits Cache achieves a 1.11–1.76× speedup in resampling, while the agent-aware scheduler reduces hotspot miss rates by 33%–51%, substantially enhancing scalability and parallel efficiency for complex tasks.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly deployed as complex agentic systems that scale with task complexity. While prior work has extensively explored model- and system-level scaling, algorithm- and task-level scaling remain largely unaddressed, constraining the full potential of agentic systems. At the algorithm level, allocating additional inference-time computation can enhance workflow capacity but introduces cross-path redundancy: overlapping computations across multiple reasoning branches. At the task level, complex tasks can be decomposed into subproblems and delegated across multiple agents for improved scalability and parallelism. However, existing infrastructures' scheduling is unaware of the existence of multiple agents, missing opportunities to optimize resource allocation. We propose Hive, a multi-agent infrastructure that enables algorithm- and task-level scaling. Hive features a description frontend that captures per-agent behavior and supports test-time scaling algorithms. Leveraging this specification, our backend introduces two key mechanisms: Logits Cache that reuses intermediate logits across redundant sampling paths to mitigate cross-path redundancy at the algorithm level, and Agent-Aware Scheduling that efficiently allocates compute and KV-cache resources according to agent contributions at the task level. Experiments show that Logits Cache achieves an average speedup of $1.11\times$-$1.76\times$ for re-sampling, and Agent-Aware Scheduling reduces the hotspot miss rate by $33\%$-$51\%$.

Problem

Research questions and friction points this paper is trying to address.

algorithm-level scaling

task-level scaling

cross-path redundancy

multi-agent systems

resource allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent infrastructure

algorithm-level scaling

task-level scaling