COLLAGE: Adaptive Fusion-based Retrieval for Augmented Policy Learning

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

In few-shot imitation learning, cross-task demonstration retrieval often introduces irrelevant or detrimental examples. To address this, we propose a multi-cue adaptive fusion data aggregation method that abandons heuristic single-feature-distance matching. Instead, it employs a task-adaptive late-fusion mechanism to dynamically weight multiple pre-filtered demonstration subsets; subset weights are determined by each subset’s policy prediction accuracy on the target action and further refined via importance sampling for training stability. The method is agnostic to feature representation and retrieval strategy, enabling broad compatibility. Experiments demonstrate significant improvements in identifying task-relevant demonstrations: +5.1% over prior state-of-the-art on 10 simulated tasks, and +16.6% on six real-world DROID tasks. Overall, our approach enhances data efficiency and generalization capability in low-data regime policy learning.

Technology Category

Application Category

📝 Abstract

In this work, we study the problem of data retrieval for few-shot imitation learning: selecting data from a large dataset to train a performant policy for a specific task, given only a few target demonstrations. Prior methods retrieve data using a single-feature distance heuristic, assuming that the best demonstrations are those that most closely resemble the target examples in visual, semantic, or motion space. However, this approach captures only a subset of the relevant information and can introduce detrimental demonstrations, e.g., retrieving data from unrelated tasks due to similar scene layouts, or selecting similar motions from tasks with divergent goals. We present COLLAGE, a method for COLLective data AGgrEgation in few-shot imitation learning that uses an adaptive late fusion mechanism to guide the selection of relevant demonstrations based on a task-specific combination of multiple cues. COLLAGE follows a simple, flexible, and efficient recipe: it assigns weights to subsets of the dataset that are pre-selected using a single feature (e.g., appearance, shape, or language similarity), based on how well a policy trained on each subset predicts actions in the target demonstrations. These weights are then used to perform importance sampling during policy training, sampling data more densely or sparsely according to estimated relevance. COLLAGE is general and feature-agnostic, allowing it to combine any number of subsets selected by any retrieval heuristic, and to identify which subsets provide the greatest benefit for the target task. In extensive experiments, COLLAGE outperforms state-of-the-art retrieval and multi-task learning approaches by 5.1% in simulation across 10 tasks, and by 16.6% in the real world across 6 tasks, where we perform retrieval from the large-scale DROID dataset. More information at https://robin-lab.cs.utexas.edu/COLLAGE .

Problem

Research questions and friction points this paper is trying to address.

Selecting relevant data for few-shot imitation learning

Overcoming limitations of single-feature retrieval heuristics

Adaptively combining multiple cues for optimal demonstration selection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive late fusion for multi-cue retrieval

Weighted importance sampling for policy training

Feature-agnostic collective data aggregation

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study