🤖 AI Summary
Existing policy combination methods in reinforcement learning lack theoretical guarantees on stability and convergence. Method: This paper proposes Multi-CALF, the first framework unifying statistically grounded weighted policy composition with formally verified robust fallback policies. It leverages value-function-based relative improvement analysis and stochastic stability theory to rigorously derive probabilistic lower bounds on convergence, upper bounds on maximum trajectory deviation, and upper bounds on convergence time. Results: Multi-CALF significantly improves performance over single-policy baselines across multiple control tasks, while ensuring—under a user-specified high probability—convergence to a target set and bounded state deviation. Its core contribution is the joint design of policy fusion and formal stability guarantees, thereby filling a critical theoretical gap in RL policy composition.
📝 Abstract
We introduce Multi-CALF, an algorithm that intelligently combines reinforcement learning policies based on their relative value improvements. Our approach integrates a standard RL policy with a theoretically-backed alternative policy, inheriting formal stability guarantees while often achieving better performance than either policy individually. We prove that our combined policy converges to a specified goal set with known probability and provide precise bounds on maximum deviation and convergence time. Empirical validation on control tasks demonstrates enhanced performance while maintaining stability guarantees.