CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

📅 2026-05-10

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

This study addresses the challenge of enabling efficient, fair, and privacy-preserving meeting scheduling among multi-agent large language models that only have access to their private calendar information. To this end, the authors propose CalBench, the first benchmark for multi-agent coordination that simultaneously supports verifiability, decentralization, and privacy sensitivity. CalBench incorporates semantically aware private contexts, distributed constraint optimization (DCOP) baselines, an optimal-solution oracle, and a cost-ratio-based metric for quantifying coordination quality, thereby enabling systematic evaluation of communication efficiency, fairness, and privacy leakage. Experimental results demonstrate that CalBench effectively assesses the trade-offs among coordination success rate, communication overhead, fairness in cost allocation, and privacy preservation, offering a robust platform for advancing research on privacy-conscious collaborative protocols.

📝 Abstract

We introduce CalBench, a controlled evaluation environment for studying multi-agent coordination through calendar scheduling. In CalBench, N agents each manage a private calendar containing pre-existing commitments and must coordinate to schedule a stream of M incoming meetings while minimizing disruption costs. Because agents observe only their own calendars, successful scheduling requires communication across private information boundaries. Each scenario is generated with an oracle solution, enabling precise measurement of coordination quality via realized-to-optimal cost, as well as a Distributed Constraint Optimization (DCOP) baseline to provide a fair comparison under the same private-information constraints. CalBench enables precise verification of task success, communication efficiency, and fairness in the distribution of disruption costs. Our environment also studies privacy-preserving coordination by augmenting calendar entries with private semantic contexts of varying sensitivity and measuring whether agents reveal task-irrelevant private information during negotiation. Unlike multi-agent benchmarks where a single capable agent can often substitute for the group, CalBench is inherently decentralized: no agent has access to another agent's private calendar, yet agents must still reach mutually consistent decisions over shared meeting scheduling. CalBench therefore provides a practical and verifiable setting for studying coordination protocols, communication efficiency, negotiation strategies, fairness, and privacy leakage in multi-agent systems.

Problem

Research questions and friction points this paper is trying to address.

multi-agent coordination

privacy preservation

calendar scheduling

private information

coordination-privacy trade-off

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent coordination

privacy-preserving communication

distributed constraint optimization