Fine-Tuning without Performance Degradation

📅 2025-05-01

📈 Citations: 0

✨ Influential: 0

career value

248K/year

🤖 AI Summary

Offline policy online fine-tuning often suffers from significant early performance degradation, primarily due to premature exploration overriding the initial policy. This work proposes a progressive exploration mechanism that accelerates environmental adaptation while preserving initial policy stability. Our core contributions are: (i) the first dynamic exploration gating scheme guided by online performance estimation, ensuring monotonic performance improvement throughout fine-tuning and overcoming the inevitable performance valley inherent in conventional methods; and (ii) integration with the Jump Start framework, unifying online performance evaluation, confidence-bound-guided exploration scheduling, and policy interpolation. Evaluated across diverse control tasks, our approach reduces average performance drop by 87%, accelerates convergence by 3.2×, and eliminates persistent degradation in all tasks.

Technology Category

Application Category

📝 Abstract

Fine-tuning policies learned offline remains a major challenge in application domains. Monotonic performance improvement during emph{fine-tuning} is often challenging, as agents typically experience performance degradation at the early fine-tuning stage. The community has identified multiple difficulties in fine-tuning a learned network online, however, the majority of progress has focused on improving learning efficiency during fine-tuning. In practice, this comes at a serious cost during fine-tuning: initially, agent performance degrades as the agent explores and effectively overrides the policy learned offline. We show across a range of settings, many offline-to-online algorithms exhibit either (1) performance degradation or (2) slow learning (sometimes effectively no improvement) during fine-tuning. We introduce a new fine-tuning algorithm, based on an algorithm called Jump Start, that gradually allows more exploration based on online estimates of performance. Empirically, this approach achieves fast fine-tuning and significantly reduces performance degradations compared with existing algorithms designed to do the same.

Problem

Research questions and friction points this paper is trying to address.

Preventing performance degradation during offline-to-online fine-tuning

Balancing exploration and offline policy retention in fine-tuning

Achieving fast fine-tuning without initial performance drops

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Jump Start-based fine-tuning algorithm

Gradually increases exploration based on performance

Reduces performance degradation during fine-tuning

🔎 Similar Papers

No similar papers found.