Lyapunov-Guided Self-Alignment: Test-Time Adaptation for Offline Safe Reinforcement Learning

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

This work addresses the safety risks in offline reinforcement learning agents arising from distributional shifts between training and deployment. The authors propose the SAS framework, which uniquely integrates Lyapunov stability conditions with a test-time self-alignment mechanism. Specifically, a pretrained Transformer-based agent generates imagined trajectories that satisfy Lyapunov stability criteria, and these trajectories are cyclically injected into the context as control-invariant prompts—enabling safe adaptation without any parameter updates. This approach endows the Transformer with an interpretable hierarchical Bayesian inference structure. Empirical evaluations on Safety Gymnasium and MuJoCo benchmarks demonstrate that SAS significantly reduces both constraint violation costs and task failure rates while maintaining or even improving task returns.

📝 Abstract

Offline reinforcement learning (RL) agents often fail when deployed, as the gap between training datasets and real environments leads to unsafe behavior. To address this, we present SAS (Self-Alignment for Safety), a transformer-based framework that enables test-time adaptation in offline safe RL without retraining. In SAS, the main mechanism is self-alignment: at test time, the pretrained agent generates several imagined trajectories and selects those satisfying the Lyapunov condition. These feasible segments are then recycled as in-context prompts, allowing the agent to realign its behavior toward safety while avoiding parameter updates. In effect, SAS turns Lyapunov-guided imagination into control-invariant prompts, and its transformer architecture admits a hierarchical RL interpretation where prompting functions as Bayesian inference over latent skills. Across Safety Gymnasium and MuJoCo benchmarks, SAS consistently reduces cost and failure while maintaining or improving return.

Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning

safe reinforcement learning

test-time adaptation

safety

distributional shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

test-time adaptation

offline reinforcement learning

Lyapunov stability