🤖 AI Summary
This work addresses the joint sensor and actuator selection problem for factored Markov decision processes (fMDPs) under a budget constraint: given partially observable states, select a limited subset of sensors and actuators to maximize the infinite-horizon discounted reward under the optimal policy. We first establish that this problem is NP-hard—and inapproximable within any nontrivial factor—in both fMDPs and generalized POMDPs. We further identify the theoretical failure conditions of the greedy algorithm. Nevertheless, through computational complexity analysis, approximation-theoretic investigation, and extensive large-scale simulations, we demonstrate that the greedy heuristic consistently achieves near-optimal performance on both real-world and synthetic instances. Our results formally characterize the intrinsic computational hardness of perception–control co-design, and provide fundamental theoretical limits as well as practical guidelines for optimizing sensing/actuation architectures in resource-constrained intelligent agents.
📝 Abstract
Factored Markov Decision Processes (fMDPs) are a class of Markov Decision Processes (MDPs) in which the states (and actions) can be factored into a set of state (and action) variables and can be encoded compactly using a factored representation. In this paper, we consider a setting where the state of the fMDP is not directly observable, and the agent relies on a set of potential sensors to gather information. Each sensor has a selection cost and the designer must select a subset of sensors under a limited budget. We formulate the problem of selecting a set of sensors for fMDPs (under a budget) to maximize the infinite-horizon discounted return provided by the optimal policy. We show the fundamental result that it is NP-hard to approximate this problem to within any non-trivial factor. Our inapproximability results for optimal sensor selection also extend to a general class of Partially Observable MDPs (POMDPs). We then study the dual problem of budgeted actuator selection (at design-time) to maximize the expected return under the optimal policy. Again, we show that it is NP-hard to approximate this problem to within any non-trivial factor. Furthermore, with explicit examples, we show the failure of greedy algorithms for both the sensor and actuator selection problems and provide insights into the factors that cause these problems to be challenging. Despite this, through extensive simulations, we show the practical effectiveness and near-optimal performance of the greedy algorithm for actuator and sensor selection in many real-world and randomly generated instances.