CalBench: Evaluating Coordination-Privacy Trade-offs in Multi-Agent LLMs

📅 2026-05-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

238K/year
🤖 AI Summary
This study addresses the challenge of enabling efficient, fair, and privacy-preserving meeting scheduling among multi-agent large language models that only have access to their private calendar information. To this end, the authors propose CalBench, the first benchmark for multi-agent coordination that simultaneously supports verifiability, decentralization, and privacy sensitivity. CalBench incorporates semantically aware private contexts, distributed constraint optimization (DCOP) baselines, an optimal-solution oracle, and a cost-ratio-based metric for quantifying coordination quality, thereby enabling systematic evaluation of communication efficiency, fairness, and privacy leakage. Experimental results demonstrate that CalBench effectively assesses the trade-offs among coordination success rate, communication overhead, fairness in cost allocation, and privacy preservation, offering a robust platform for advancing research on privacy-conscious collaborative protocols.
📝 Abstract
We introduce CalBench, a controlled evaluation environment for studying multi-agent coordination through calendar scheduling. In CalBench, N agents each manage a private calendar containing pre-existing commitments and must coordinate to schedule a stream of M incoming meetings while minimizing disruption costs. Because agents observe only their own calendars, successful scheduling requires communication across private information boundaries. Each scenario is generated with an oracle solution, enabling precise measurement of coordination quality via realized-to-optimal cost, as well as a Distributed Constraint Optimization (DCOP) baseline to provide a fair comparison under the same private-information constraints. CalBench enables precise verification of task success, communication efficiency, and fairness in the distribution of disruption costs. Our environment also studies privacy-preserving coordination by augmenting calendar entries with private semantic contexts of varying sensitivity and measuring whether agents reveal task-irrelevant private information during negotiation. Unlike multi-agent benchmarks where a single capable agent can often substitute for the group, CalBench is inherently decentralized: no agent has access to another agent's private calendar, yet agents must still reach mutually consistent decisions over shared meeting scheduling. CalBench therefore provides a practical and verifiable setting for studying coordination protocols, communication efficiency, negotiation strategies, fairness, and privacy leakage in multi-agent systems.
Problem

Research questions and friction points this paper is trying to address.

multi-agent coordination
privacy preservation
calendar scheduling
private information
coordination-privacy trade-off
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent coordination
privacy-preserving communication
distributed constraint optimization
controlled evaluation benchmark
private information boundaries