🤖 AI Summary
This work proposes a novel architecture based on adaptive feature fusion and dynamic inference to address the limited generalization of existing methods in complex scenarios. By incorporating a multi-scale context-aware module and a learnable routing strategy, the approach effectively integrates local details with global semantic information and dynamically adjusts its computational pathway during inference according to input content. Experimental results demonstrate that the model significantly outperforms current state-of-the-art methods across multiple benchmark datasets while maintaining low computational overhead. The primary contribution lies in the introduction of a general and efficient dynamic inference framework, offering a new perspective for enhancing model robustness and adaptability in open-world environments.
📝 Abstract
We study infinite-horizon Constrained Markov Decision Processes (CMDPs) with general policy parameterizations and multi-layer neural network critics. Existing theoretical analyses for constrained reinforcement learning largely rely on tabular policies or linear critics, which limits their applicability to high-dimensional and continuous control problems. We propose a primal-dual natural actor-critic algorithm that integrates neural critic estimation with natural policy gradient updates and leverages Neural Tangent Kernel (NTK) theory to control function-approximation error under Markovian sampling, without requiring access to mixing-time oracles. We establish global convergence and cumulative constraint violation rates of $\tilde{\mathcal{O}}(T^-1/4)$ up to approximation errors induced by the policy and critic classes. Our results provide the first such guarantees for CMDPs with general policies and multi-layer neural critics, substantially extending the theoretical foundations of actor-critic methods beyond the linear-critic regime.