Mapping representations in Reinforcement Learning via Semantic Alignment for Zero-Shot Stitching

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

Reinforcement learning agents exhibit poor generalization under visual or task variations, necessitating costly retraining and hindering policy reuse. To address this, we propose a zero-shot cross-agent representation mapping method that estimates affine or orthogonal transformations between latent spaces using semantically aligned anchor points, enabling fine-tuning-free policy stitching. Our approach is the first to support modular, compositional zero-shot policy recombination—bypassing conventional transfer learning’s reliance on target-domain data or parameter adaptation. Evaluated in the CarRacing environment under concurrent background and task shifts, our method achieves high-performance zero-shot policy composition: average return retains over 95% of the original policy’s performance. This substantially enhances policy robustness and reusability in dynamic environments.

Technology Category

Application Category

📝 Abstract

Deep Reinforcement Learning (RL) models often fail to generalize when even small changes occur in the environment's observations or task requirements. Addressing these shifts typically requires costly retraining, limiting the reusability of learned policies. In this paper, we build on recent work in semantic alignment to propose a zero-shot method for mapping between latent spaces across different agents trained on different visual and task variations. Specifically, we learn a transformation that maps embeddings from one agent's encoder to another agent's encoder without further fine-tuning. Our approach relies on a small set of"anchor"observations that are semantically aligned, which we use to estimate an affine or orthogonal transform. Once the transformation is found, an existing controller trained for one domain can interpret embeddings from a different (existing) encoder in a zero-shot fashion, skipping additional trainings. We empirically demonstrate that our framework preserves high performance under visual and task domain shifts. We empirically demonstrate zero-shot stitching performance on the CarRacing environment with changing background and task. By allowing modular re-assembly of existing policies, it paves the way for more robust, compositional RL in dynamically changing environments.

Problem

Research questions and friction points this paper is trying to address.

Generalization failure in RL models due to environmental changes.

Costly retraining limits policy reusability across different tasks.

Zero-shot mapping between latent spaces for modular policy reuse.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot mapping between latent spaces

Semantic alignment with anchor observations

Affine or orthogonal transform for embeddings

🔎 Similar Papers

R3L: Relative Representations for Reinforcement Learning