🤖 AI Summary
This work addresses few-shot concept learning—acquiring new task intents (e.g., object goals, action patterns) from minimal examples and generalizing them to unseen environments. We propose the first zero-weight-update inversion learning framework based on invertible neural generative models. Methodologically, it performs gradient-free concept inversion via inverse mapping optimization, encoding task intents as compositional and disentangled latent representations. Our key contributions are: (i) the first application of invertible generative models to zero-shot task concept inversion; and (ii) enabling cross-environment transfer and compositional generalization of learned concepts. Experiments span five domains—object rearrangement, goal-directed navigation, human motion description, autonomous driving, and desktop manipulation—demonstrating substantial improvements over state-of-the-art few-shot methods in rapid task adaptation and high-fidelity policy/trajectory generation.
📝 Abstract
Learning the intents of an agent, defined by its goals or motion style, is often extremely challenging from just a few examples. We refer to this problem as task concept learning and present our approach, Few-Shot Task Learning through Inverse Generative Modeling (FTL-IGM), which learns new task concepts by leveraging invertible neural generative models. The core idea is to pretrain a generative model on a set of basic concepts and their demonstrations. Then, given a few demonstrations of a new concept (such as a new goal or a new action), our method learns the underlying concepts through backpropagation without updating the model weights, thanks to the invertibility of the generative model. We evaluate our method in five domains -- object rearrangement, goal-oriented navigation, motion caption of human actions, autonomous driving, and real-world table-top manipulation. Our experimental results demonstrate that via the pretrained generative model, we successfully learn novel concepts and generate agent plans or motion corresponding to these concepts in (1) unseen environments and (2) in composition with training concepts.