Portable Reward Tuning: Towards Reusable Fine-Tuning across Different Pretrained Models

๐Ÿ“… 2025-02-18
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the inefficiencies of iterative fine-tuning triggered by base model updates and the high inference overhead of legacy models, this paper proposes Portable Reward Tuning (PRT), a novel fine-tuning paradigm. Methodologically, PRT reformulates fine-tuning as a reward maximization problem, explicitly training a lightweight, base-model-agnostic reward modelโ€”enabling zero-parameter transfer and plug-and-play cross-model deployment. It integrates reinforcement learning principles, unified loss optimization, and a cross-modal universal interface design. Empirically, on multi-task benchmarks, PRT achieves accuracy comparable to inference-time adaptation while reducing inference latency by 37% and memory footprint by 52%. To our knowledge, PRT is the first approach to achieve highly reusable, low-overhead cross-model fine-tuning transfer.

Technology Category

Application Category

๐Ÿ“ Abstract
While foundation models have been exploited for various expert tasks through fine-tuning, any foundation model will become outdated due to its old knowledge or limited capability. Thus the underlying foundation model should be eventually replaced by new ones, which leads to repeated cost of fine-tuning these new models. Existing work addresses this problem by inference-time tuning, i.e., modifying the output probabilities from the new foundation model with the outputs from the old foundation model and its fine-tuned model, which involves an additional overhead in inference by the latter two models. In this paper, we propose a new fine-tuning principle, Portable Reward Tuning (PRT), that reduces the inference overhead by its nature, based on the reformulation of fine-tuning as the reward maximization. Specifically, instead of fine-tuning parameters of the foundation models, PRT trains the reward model explicitly through the same loss function as in fine-tuning. During inference, the reward model can be used with any foundation model (with the same set of vocabularies or labels) through the formulation of reward maximization. Experimental results, covering both vision and language models, demonstrate that the PRT-trained model can achieve comparable accuracy to the existing work of inference-time tuning, with less inference cost.
Problem

Research questions and friction points this paper is trying to address.

Reduces fine-tuning costs for new models
Minimizes inference overhead effectively
Enables reusable tuning across models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Portable Reward Tuning
Reward maximization principle
Reduces inference overhead
๐Ÿ”Ž Similar Papers
Daiki Chijiwa
Daiki Chijiwa
NTT
T
Taku Hasegawa
NTT Human Informatics Laboratories, NTT Corporation
Kyosuke Nishida
Kyosuke Nishida
NTT Human Informatics Laboratories, NTT Corporation
natural language processingvision and languageartificial intelligencedata mining
K
Kuniko Saito
NTT Human Informatics Laboratories, NTT Corporation
S
Susumu Takeuchi
NTT Computer and Data Science Laboratories, NTT Corporation