Reinforcement Learning with Action-Triggered Observations

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

203K/year

🤖 AI Summary

研究动作触发观测的强化学习问题，提出ATST-MDP框架和动作序列学习范式，设计ST-LSVI-UCB算法并证明其理论可行性。

Technology Category

Application Category

📝 Abstract

We study reinforcement learning problems where state observations are stochastically triggered by actions, a constraint common in many real-world applications. This framework is formulated as Action-Triggered Sporadically Traceable Markov Decision Processes (ATST-MDPs), where each action has a specified probability of triggering a state observation. We derive tailored Bellman optimality equations for this framework and introduce the action-sequence learning paradigm in which agents commit to executing a sequence of actions until the next observation arrives. Under the linear MDP assumption, value-functions are shown to admit linear representations in an induced action-sequence feature map. Leveraging this structure, we propose off-policy estimators with statistical error guarantees for such feature maps and introduce ST-LSVI-UCB, a variant of LSVI-UCB adapted for action-triggered settings. ST-LSVI-UCB achieves regret $widetilde O(sqrt{Kd^3(1-γ)^{-3}})$, where $K$ is the number of episodes, $d$ the feature dimension, and $γ$ the discount factor (per-step episode non-termination probability). Crucially, this work establishes the theoretical foundation for learning with sporadic, action-triggered observations while demonstrating that efficient learning remains feasible under such observation constraints.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement learning with stochastic action-triggered state observations

Formulating ATST-MDPs with tailored Bellman optimality equations

Developing efficient algorithms with theoretical guarantees for sporadic observations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Action-triggered observations in MDP formulation

Action-sequence learning with linear representations

ST-LSVI-UCB algorithm with statistical guarantees

🔎 Similar Papers

Revealing the learning process in reinforcement learning agents through attention-oriented metrics