🤖 AI Summary
This work addresses the lack of formal performance guarantees for unseen tasks in multitask reinforcement learning within safety-critical settings. To this end, it presents the first generalization error bound applicable to arbitrary unknown task distributions. By integrating a lower bound on single-task performance—derived from finite trajectory data—with task-level generalization capacity, the paper establishes a high-confidence framework for policy performance assurance. The approach combines probabilistic generalization theory with confidence-bound analysis to deliver rigorous performance guarantees for novel tasks. Empirical validation across multiple state-of-the-art multitask RL algorithms demonstrates that the proposed bound is both theoretically sound and practically sample-efficient.
📝 Abstract
Multi-task reinforcement learning trains generalist policies that can execute multiple tasks. While recent years have seen significant progress, existing approaches rarely provide formal performance guarantees, which are indispensable when deploying policies in safety-critical settings. We present an approach for computing high-confidence guarantees on the performance of a multi-task policy on tasks not seen during training. Concretely, we introduce a new generalisation bound that composes (i) per-task lower confidence bounds from finitely many rollouts with (ii) task-level generalisation from finitely many sampled tasks, yielding a high-confidence guarantee for new tasks drawn from the same arbitrary and unknown distribution. Across state-of-the-art multi-task RL methods, we show that the guarantees are theoretically sound and informative at realistic sample sizes.