MDPs with a State Sensing Cost

📅 2025-05-06

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This paper addresses cost-aware, state-perceptive sequential decision-making, where an agent must explicitly trade off the cost of state observations against the reward from optimal actions—departing from the standard assumption of perfect state observability. To this end, we formulate the sensing timing decision as a discounted-cost Markov decision process (MDP) over an augmented state space, introducing a “blind state” to enable action selection without observation. Theoretically, we propose policies with bounded blind-action steps and derive a provable suboptimality bound. Methodologically, we design an efficient, policy-improvement-based heuristic algorithm. Numerical experiments demonstrate that our algorithm achieves near-optimal performance across multiple benchmarks and significantly outperforms existing baselines. Our core contribution lies in unifying the coupled sensing–decision process within a single MDP framework, providing both interpretable theoretical guarantees and a practical computational solution.

Technology Category

Application Category

📝 Abstract

In many practical sequential decision-making problems, tracking the state of the environment incurs a sensing/communication/computation cost. In these settings, the agent's interaction with its environment includes the additional component of deciding $ extit{when}$ to sense the state, in a manner that balances the value associated with optimal (state-specific) actions and the cost of sensing. We formulate this as an expected discounted cost Markov Decision Process (MDP), wherein the agent incurs an additional cost for sensing its next state, but has the option to take actions while remaining 'blind' to the system state. We pose this problem as a classical discounted cost MDP with an expanded (countably infinite) state space. While computing the optimal policy for this MDP is intractable in general, we bound the sub-optimality gap associated with optimal policies in a restricted class, where the number of consecutive non-sensing (a.k.a., blind) actions is capped. We also design a computationally efficient heuristic algorithm based on policy improvement, which in practice performs close to the optimal policy. Finally, we benchmark against the state of the art via a numerical case study.

Problem

Research questions and friction points this paper is trying to address.

Balancing sensing costs and optimal actions in MDPs

Formulating MDPs with expanded state space for blind actions

Designing efficient heuristic policies for state sensing trade-offs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Expands MDP with state sensing cost

Bounds sub-optimality gap for restricted policies

Designs efficient heuristic policy improvement algorithm

🔎 Similar Papers

No similar papers found.