🤖 AI Summary
This work addresses the non-local thermodynamic equilibrium (non-LTE) radiative transfer problem for two-level atoms. We propose the first end-to-end deep reinforcement learning (DRL) framework, formulating the self-consistent computation of the depth-dependent source function $ S( au) $ as a control task. A feedforward neural network policy is trained via the Soft Actor-Critic algorithm to directly optimize $ S( au) $ within a classical, black-box radiative transfer engine—ensuring satisfaction of the statistical equilibrium equation. Crucially, we introduce the Λ*-Free paradigm, eliminating reliance on the lambda operator and differentiable RT solvers, thereby enabling unsupervised, gradient-free enforcement of physical constraints. The method successfully recovers non-LTE self-consistent solutions on standard 1D atmospheric models, demonstrating the failure of conventional greedy strategies due to the “moving target” effect and revealing the critical role of the discount factor in convergence efficiency. The framework naturally generalizes to multi-dimensional geometries and strong velocity fields.
📝 Abstract
We present a novel reinforcement learning (RL) approach for solving the classical 2-level atom non-LTE radiative transfer problem by framing it as a control task in which an RL agent learns a depth-dependent source function $S( au)$ that self-consistently satisfies the equation of statistical equilibrium (SE). The agent's policy is optimized entirely via reward-based interactions with a radiative transfer engine, without explicit knowledge of the ground truth. This method bypasses the need for constructing approximate lambda operators ($Lambda^*$) common in accelerated iterative schemes. Additionally, it requires no extensive precomputed labeled datasets to extract a supervisory signal, and avoids backpropagating gradients through the complex RT solver itself. Finally, we show through experiment that a simple feedforward neural network trained greedily cannot solve for SE, possibly due to the moving target nature of the problem. Our $Lambda^*- ext{Free}$ method offers potential advantages for complex scenarios (e.g., atmospheres with enhanced velocity fields, multi-dimensional geometries, or complex microphysics) where $Lambda^*$ construction or solver differentiability is challenging. Additionally, the agent can be incentivized to find more efficient policies by manipulating the discount factor, leading to a reprioritization of immediate rewards. If demonstrated to generalize past its training data, this RL framework could serve as an alternative or accelerated formalism to achieve SE. To the best of our knowledge, this study represents the first application of reinforcement learning in solar physics that directly solves for a fundamental physical constraint.