đ€ AI Summary
To address the poor robustness, limited generalizability, and low sample efficiency of reinforcement learning (RL) control in chaotic convective flowsâparticularly RayleighâBĂ©nard convectionâthis work proposes a physics-informed RL framework. Methodologically, we design a domain-guided reward function grounded in vortex dynamics, explicitly promoting BĂ©nard vortex merging to enhance policy transferability across flow regimes (laminar â chaotic) and accelerate convergence; proximal policy optimization (PPO) is employed for stable, efficient training. Results demonstrate that the learned controller reduces convective heat transfer by 33% in laminar flow and maintains a 10% reduction under chaotic conditionsâsignificantly outperforming conventional controllers. Crucially, it achieves cross-regime stabilization without retraining, marking the first demonstration of such seamless, zero-shot transfer in RayleighâBĂ©nard control. The approach thus delivers superior robustness, strong generalization across dynamical regimes, and high sample efficiency.
đ Abstract
Chaotic convective flows arise in many real-world systems, such as microfluidic devices and chemical reactors. Stabilizing these flows is highly desirable but remains challenging, particularly in chaotic regimes where conventional control methods often fail. Reinforcement Learning (RL) has shown promise for control in laminar flow settings, but its ability to generalize and remain robust under chaotic and turbulent dynamics is not well explored, despite being critical for real-world deployment. In this work, we improve the practical feasibility of RL-based control of such flows focusing on Rayleigh-Bénard Convection (RBC), a canonical model for convective heat transport. To enhance generalization and sample efficiency, we introduce domain-informed RL agents that are trained using Proximal Policy Optimization across diverse initial conditions and flow regimes. We incorporate domain knowledge in the reward function via a term that encourages Bénard cell merging, as an example of a desirable macroscopic property. In laminar flow regimes, the domain-informed RL agents reduce convective heat transport by up to 33%, and in chaotic flow regimes, they still achieve a 10% reduction, which is significantly better than the conventional controllers used in practice. We compare the domain-informed to uninformed agents: Our results show that the domain-informed reward design results in steady flows, faster convergence during training, and generalization across flow regimes without retraining. Our work demonstrates that elegant domain-informed priors can greatly enhance the robustness of RL-based control of chaotic flows, bringing real-world deployment closer.