🤖 AI Summary
To address negative transfer, domain adaptation challenges, and inefficient source policy selection in reinforcement learning (RL) transfer, this paper proposes a knowledge transfer framework grounded in multimodal task similarity. The method jointly models visual frames and textual descriptions to construct a unified latent representation of environment dynamics; source policies are then efficiently selected and adaptively transferred via similarity metrics in the learned embedding space. Its key innovation lies in the first integration of vision-language cross-modal representations for task similarity assessment in policy transfer—significantly mitigating negative transfer. Experiments in a multi-track racing environment demonstrate that, compared to from-scratch training, our approach achieves equivalent final performance while reducing training steps by 42%–67%, validating both effectiveness and practicality.
📝 Abstract
Transfer Learning (TL) offers the potential to accelerate learning by transferring knowledge across tasks. However, it faces critical challenges such as negative transfer, domain adaptation and inefficiency in selecting solid source policies. These issues often represent critical problems in evolving domains, i.e. game development, where scenarios transform and agents must adapt. The continuous release of new agents is costly and inefficient. In this work we challenge the key issues in TL to improve knowledge transfer, agents performance across tasks and reduce computational costs. The proposed methodology, called FAST - Framework for Adaptive Similarity-based Transfer, leverages visual frames and textual descriptions to create a latent representation of tasks dynamics, that is exploited to estimate similarity between environments. The similarity scores guides our method in choosing candidate policies from which transfer abilities to simplify learning of novel tasks. Experimental results, over multiple racing tracks, demonstrate that FAST achieves competitive final performance compared to learning-from-scratch methods while requiring significantly less training steps. These findings highlight the potential of embedding-driven task similarity estimations.