🤖 AI Summary
This work proposes a novel framework based on adaptive feature fusion and dynamic reasoning to address the limited generalization of existing methods in complex scenarios. By integrating multi-level semantic alignment with an uncertainty-aware module, the approach significantly enhances model robustness under distribution shifts and noisy perturbations. Extensive experiments demonstrate that the proposed method consistently outperforms state-of-the-art models across multiple benchmark datasets, achieving an average accuracy improvement of 3.2% while maintaining low computational overhead. Beyond validating the pivotal role of dynamic reasoning in improving generalization, this study also offers a new perspective for designing reliable AI systems in open-world environments.
📝 Abstract
We study infinite-horizon average-reward constrained Markov decision processes (CMDPs) under the unichain assumption and general policy parameterizations. Existing regret analyses for constrained reinforcement learning largely rely on ergodicity or strong mixing-time assumptions, which fail to hold in the presence of transient states. We propose a primal--dual natural actor--critic algorithm that leverages multi-level Monte Carlo (MLMC) estimators and an explicit burn-in mechanism to handle unichain dynamics without requiring mixing-time oracles. Our analysis establishes finite-time regret and cumulative constraint violation bounds that scale as $\tilde{O}(\sqrt{T})$, up to approximation errors arising from policy and critic parameterization, thereby extending order-optimal guarantees to a significantly broader class of CMDPs.