CKT-WAM: Parameter-Efficient Context Knowledge Transfer Between World Action Models

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

210K/year
🤖 AI Summary
This work addresses the challenges of knowledge transfer between heterogeneous World Action Models (WAMs), which stem from interface mismatches, high adaptation costs, and the rigidity of conventional distillation objectives. To overcome these issues, the authors propose the CKT-WAM framework, which constructs a compact context in the text embedding space and employs learnable query-based cross-attention to compress teacher hidden states. The approach integrates a persistent universal adapter with sparsely activated task-specific adapters, enabling efficient transfer with minimal trainable parameters while avoiding output imitation or dense feature alignment. Evaluated on LIBERO-Plus, CKT-WAM achieves a total success rate of 86.1% using only 1.17% trainable parameters—approaching full fine-tuning performance—and demonstrates strong generalization with an average success rate of 83.3% across four categories of real-world long-horizon manipulation tasks.
📝 Abstract
World action models (WAMs) provide a powerful generative framework for embodied control, yet transferring knowledge across heterogeneous WAMs remains challenging due to mismatched latent interfaces, high adaptation cost, and the rigidity of conventional distillation objectives. We propose \textbf{CKT-WAM}, a parameter-efficient \textbf{C}ontext \textbf{K}nowledge \textbf{T}ransfer framework that transfers teacher WAM's knowledge into a student WAM through a compact context in the text embedding space, rather than output imitation or dense hidden-state matching. Specifically, CKT-WAM extracts intermediate teacher hidden states, reduces the number of tokens via compressors' learnable-query cross attention (LQCA), and transforms them through an always-on generalized adapter, a lightweight router, and sparsely activated specialized adapters. The resulting context is then appended to the student's conditioning textual embeddings, thereby injecting the transferred knowledge into the student with minimal architectural modification. Experiments show that CKT-WAM consistently improves zero-shot generalization and achieves the best overall performance on LIBERO-Plus, reaching 86.1\% total success rate with only 1.17\% trainable parameters, while approaching full fine-tuning performance. Beyond simulation, CKT-WAM also demonstrates strong real-world long-horizon manipulation ability, achieving the best average success rate of 83.3\% across four multi-step and long-horizon tasks. Code is available at https://github.com/YuhuaJiang2002/CKT-WAM.
Problem

Research questions and friction points this paper is trying to address.

World Action Models
Knowledge Transfer
Parameter Efficiency
Heterogeneous Models
Zero-shot Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Context Knowledge Transfer
Parameter-Efficient Adaptation
World Action Models
Sparse Activation
Text Embedding Space
Yuhua Jiang
Yuhua Jiang
Tsinghua University
reinforcement learning
Y
Yijun Guo
LivsynRobotics
H
Hongbing Yang
Tsinghua University
G
Guojun Lei
LivsynRobotics
N
Nuo Chen
Tsinghua University
Yinuo Zhang
Yinuo Zhang
PhD student, DUKE-NUS Medical School
ProteinPeptidesBiologyDeep Learning
S
Shaoqiang Yan
Tsinghua University
B
Bo Lin
Tsinghua University
Feifei Gao
Feifei Gao
Associate Professor at Tsinghua University, IEEEFellow
AI assisted Wireless CommunicationsSignal Processing for CommunicationsArray Signal Processing
B
Biqing Qi
Shanghai AI Laboratory