Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

๐Ÿ“… 2026-04-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

207K/year
๐Ÿค– AI Summary
This work addresses the safety risks in offline reinforcement learning agents arising from distributional shifts between training and deployment. The authors propose the SAS framework, which uniquely integrates Lyapunov stability conditions with a test-time self-alignment mechanism. Specifically, a pretrained Transformer-based agent generates imagined trajectories that satisfy Lyapunov stability criteria, and these trajectories are cyclically injected into the context as control-invariant promptsโ€”enabling safe adaptation without any parameter updates. This approach endows the Transformer with an interpretable hierarchical Bayesian inference structure. Empirical evaluations on Safety Gymnasium and MuJoCo benchmarks demonstrate that SAS significantly reduces both constraint violation costs and task failure rates while maintaining or even improving task returns.
๐Ÿ“ Abstract
Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
safe reinforcement learning
test-time adaptation
safety
distributional shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation
offline reinforcement learning
Lyapunov stability
in-context learning
safe RL