🤖 AI Summary
Offline reinforcement learning (RL) policies exhibit insufficient robustness against action-space perturbations, such as actuator failures. To address this, we propose an offline-to-online adversarial fine-tuning framework—the first to introduce adversarial fine-tuning into offline RL—by injecting controllable action perturbations and designing a performance-aware adaptive curriculum: the perturbation probability is dynamically adjusted via exponential moving average to enhance robustness without degrading original policy performance. Our method integrates offline pretraining, online adversarial fine-tuning, and perturbation injection, requiring no additional online exploration. Evaluated on continuous-control locomotion tasks, it significantly improves disturbance resilience over pure offline baselines, converges faster than from-scratch training, and achieves optimal robustness when fine-tuning and test perturbations are matched. These results validate both the effectiveness and practicality of the proposed framework.
📝 Abstract
Offline reinforcement learning enables sample-efficient policy acquisition without risky online interaction, yet policies trained on static datasets remain brittle under action-space perturbations such as actuator faults. This study introduces an offline-to-online framework that trains policies on clean data and then performs adversarial fine-tuning, where perturbations are injected into executed actions to induce compensatory behavior and improve resilience. A performance-aware curriculum further adjusts the perturbation probability during training via an exponential-moving-average signal, balancing robustness and stability throughout the learning process. Experiments on continuous-control locomotion tasks demonstrate that the proposed method consistently improves robustness over offline-only baselines and converges faster than training from scratch. Matching the fine-tuning and evaluation conditions yields the strongest robustness to action-space perturbations, while the adaptive curriculum strategy mitigates the degradation of nominal performance observed with the linear curriculum strategy. Overall, the results show that adversarial fine-tuning enables adaptive and robust control under uncertain environments, bridging the gap between offline efficiency and online adaptability.