Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

📅 2026-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a challenging setting in multi-agent collaboration where a task yields reward only if the coalition size exceeds an unknown threshold; otherwise, feedback is entirely masked, making it difficult to discern whether failure stems from environmental stochasticity or insufficient coordination. The authors formalize this scenario as a Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and propose both a centralized algorithm (C-TAC) and a decentralized variant (D-TAC). Leveraging a regret decomposition framework that integrates structural learning with statistical estimation, C-TAC achieves O(log T) cumulative regret. D-TAC preserves near-centralized coordination efficiency while reducing communication overhead by 23× compared to baseline approaches, through event-triggered synchronization and a conservative belief fusion mechanism.
📝 Abstract
In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination. We show that a centralized algorithm (C-TAC) achieves cumulative regret O(log T), decomposed into a structural-search term that captures the cost of resolving feasibility under censored feedback and a statistical-monitoring term for value estimation. We then introduce D-TAC, a decentralized event-triggered protocol in which agents synchronize only when their structural beliefs change. Empirically, D-TAC achieves a 23x reduction in communication relative to the centralized baseline while preserving feasibility alignment under conservative belief fusion. These results characterize the coordination cost of learning under censored feedback and show that near-centralized communication efficiency is achievable without continuous synchronization.
Problem

Research questions and friction points this paper is trying to address.

censored feedback
multi-agent coordination
threshold-activated reward
identifiability problem
structural learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Threshold-Bandit
Censored Feedback
Decentralized Coordination
Structural Learning
Event-Triggered Communication