🤖 AI Summary
This paper investigates the Sequential Fault-Tolerant Process Planning (SFIPP) problem: a multi-stage decision-making setting where reward is granted only upon successful completion of all stages—contrasting with conventional cumulative reward paradigms—and arising in high-reliability domains such as drug discovery and safety-critical systems. We provide the first formal modeling of SFIPP. For both deterministic and stochastic action settings, we propose compact online algorithms that integrate structural prior knowledge with multi-armed bandit strategies (UCB and Thompson sampling) to achieve efficient exploration-exploitation trade-offs. We establish tight theoretical guarantees on competitive ratio for both algorithms. Empirical evaluation demonstrates that our structure-aware algorithms significantly outperform generic baselines in both success rate and sample efficiency.
📝 Abstract
We propose and study a planning problem we call Sequential Fault-Intolerant Process Planning (SFIPP). SFIPP captures a reward structure common in many sequential multi-stage decision problems where the planning is deemed successful only if all stages succeed. Such reward structures are different from classic additive reward structures and arise in important applications such as drug/material discovery, security, and quality-critical product design. We design provably tight online algorithms for settings in which we need to pick between different actions with unknown success chances at each stage. We do so both for the foundational case in which the behavior of actions is deterministic, and the case of probabilistic action outcomes, where we effectively balance exploration for learning and exploitation for planning through the usage of multi-armed bandit algorithms. In our empirical evaluations, we demonstrate that the specialized algorithms we develop, which leverage additional information about the structure of the SFIPP instance, outperform our more general algorithm.