Fine-Tuning without Performance Degradation

📅 2025-05-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline policy online fine-tuning often suffers from significant early performance degradation, primarily due to premature exploration overriding the initial policy. This work proposes a progressive exploration mechanism that accelerates environmental adaptation while preserving initial policy stability. Our core contributions are: (i) the first dynamic exploration gating scheme guided by online performance estimation, ensuring monotonic performance improvement throughout fine-tuning and overcoming the inevitable performance valley inherent in conventional methods; and (ii) integration with the Jump Start framework, unifying online performance evaluation, confidence-bound-guided exploration scheduling, and policy interpolation. Evaluated across diverse control tasks, our approach reduces average performance drop by 87%, accelerates convergence by 3.2×, and eliminates persistent degradation in all tasks.

Technology Category

Application Category

📝 Abstract
Fine-tuning policies learned offline remains a major challenge in application domains. Monotonic performance improvement during emph{fine-tuning} is often challenging, as agents typically experience performance degradation at the early fine-tuning stage. The community has identified multiple difficulties in fine-tuning a learned network online, however, the majority of progress has focused on improving learning efficiency during fine-tuning. In practice, this comes at a serious cost during fine-tuning: initially, agent performance degrades as the agent explores and effectively overrides the policy learned offline. We show across a range of settings, many offline-to-online algorithms exhibit either (1) performance degradation or (2) slow learning (sometimes effectively no improvement) during fine-tuning. We introduce a new fine-tuning algorithm, based on an algorithm called Jump Start, that gradually allows more exploration based on online estimates of performance. Empirically, this approach achieves fast fine-tuning and significantly reduces performance degradations compared with existing algorithms designed to do the same.
Problem

Research questions and friction points this paper is trying to address.

Preventing performance degradation during offline-to-online fine-tuning
Balancing exploration and offline policy retention in fine-tuning
Achieving fast fine-tuning without initial performance drops
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Jump Start-based fine-tuning algorithm
Gradually increases exploration based on performance
Reduces performance degradation during fine-tuning
🔎 Similar Papers
No similar papers found.
H
Han Wang
Department of Computing Science, University of Alberta, Canada; Alberta Machine Intelligence Institute (Amii)
Adam White
Adam White
University of Alberta, Amii (Alberta Machine Intelligence Institute)
Artificial IntelligenceReinforcement Learning
Martha White
Martha White
University of Alberta
Machine Learning