Cost of Structural Learning Under Censored Feedback: A Threshold-Bandit Approach

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

247K/year

🤖 AI Summary

This work addresses a challenging setting in multi-agent collaboration where a task yields reward only if the coalition size exceeds an unknown threshold; otherwise, feedback is entirely masked, making it difficult to discern whether failure stems from environmental stochasticity or insufficient coordination. The authors formalize this scenario as a Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and propose both a centralized algorithm (C-TAC) and a decentralized variant (D-TAC). Leveraging a regret decomposition framework that integrates structural learning with statistical estimation, C-TAC achieves O(log T) cumulative regret. D-TAC preserves near-centralized coordination efficiency while reducing communication overhead by 23× compared to baseline approaches, through event-triggered synchronization and a conservative belief fusion mechanism.

📝 Abstract

In many multi-agent applications, tasks yield rewards only when executed by a coalition meeting an unknown size threshold; otherwise, feedback is fully censored. This censorship creates an identifiability problem: agents cannot distinguish stochastic failure from insufficient coordination. We formalize this setting as the Threshold-Activated Cooperative Multi-Armed Bandit (TAC-MAB) and analyze it under both centralized and decentralized coordination. We show that a centralized algorithm (C-TAC) achieves cumulative regret O(log T), decomposed into a structural-search term that captures the cost of resolving feasibility under censored feedback and a statistical-monitoring term for value estimation. We then introduce D-TAC, a decentralized event-triggered protocol in which agents synchronize only when their structural beliefs change. Empirically, D-TAC achieves a 23x reduction in communication relative to the centralized baseline while preserving feasibility alignment under conservative belief fusion. These results characterize the coordination cost of learning under censored feedback and show that near-centralized communication efficiency is achievable without continuous synchronization.

Problem

Research questions and friction points this paper is trying to address.

censored feedback

multi-agent coordination

threshold-activated reward

identifiability problem

structural learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Threshold-Bandit

Censored Feedback

Decentralized Coordination

Structural Learning

Event-Triggered Communication

🔎 Similar Papers

Prior-Dependent Allocations for Bayesian Fixed-Budget Best-Arm Identification in Structured Bandits

2024-02-08arXiv.orgCitations: 2

Pretraining Decision Transformers with Reward Prediction for In-Context Multi-task Structured Bandit Learning

2024-06-07arXiv.orgCitations: 1