Global Convergence of Average Reward Constrained MDPs with Neural Critic and General Policy Parameterization

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work proposes a novel architecture based on adaptive feature fusion and dynamic inference to address the limited generalization of existing methods in complex scenarios. By incorporating a multi-scale context-aware module and a learnable routing strategy, the approach effectively integrates local details with global semantic information and dynamically adjusts its computational pathway during inference according to input content. Experimental results demonstrate that the model significantly outperforms current state-of-the-art methods across multiple benchmark datasets while maintaining low computational overhead. The primary contribution lies in the introduction of a general and efficient dynamic inference framework, offering a new perspective for enhancing model robustness and adaptability in open-world environments.

Technology Category

Application Category

📝 Abstract

We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on tabular policies or linear critics, which limits their applicability to high-dimensional and continuous control problems. We propose a primal-dual natural actor-critic algorithm that integrates neural critic estimation with natural policy gradient updates and leverages Neural Tangent Kernel (NTK) theory to control function-approximation error under Markovian sampling, without requiring access to mixing-time oracles. We establish global convergence and cumulative constraint violation rates of $\tilde{\mathcal{O}}(T^-1/4)$ up to approximation errors induced by the policy and critic classes. Our results provide the first such guarantees for CMDPs with general policies and multi-layer neural critics, substantially extending the theoretical foundations of actor-critic methods beyond the linear-critic regime.

Problem

Research questions and friction points this paper is trying to address.

Constrained Markov Decision Processes

neural critic

general policy parameterization

global convergence

actor-critic

Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained MDPs

Neural Critic

Natural Actor-Critic