🤖 AI Summary
This study investigates how to achieve full-stack transfer learning in robotics across multiple levels of abstraction, from high-level language instructions to low-level motor skills. By unifying the analysis of common transfer mechanisms in large language models (LLMs), vision-language models (VLMs), and vision-language-action models (VLAs), this work offers the first systematic exploration—from the perspective of robotic transfer learning—of the role played by foundation models and Transformer architectures in bridging high-level semantic understanding with low-level action generation. The findings demonstrate that foundation models constitute a critical pathway toward full-stack transfer in robotics, while also highlighting key challenges such as the need for high-quality data collection and the establishment of standardized transfer benchmarks.
📝 Abstract
In humans and robots alike, transfer learning occurs at different levels of abstraction, from high-level linguistic transfer to low-level transfer of motor skills. In this article, we provide an overview of the impact that foundation models and transformer networks have had on these different levels, bringing robots closer than ever to "full-stack transfer". Considering LLMs, VLMs and VLAs from a robotic transfer learning perspective allows us to highlight recurring concepts for transfer, beyond specific implementations. We also consider the challenges of data collection and transfer benchmarks for robotics in the age of foundation models. Are foundation models the route to full-stack transfer in robotics? Our expectation is that they will certainly stay on this route as a key technology.