Guiding Data Collection via Factored Scaling Curves

📅 2025-05-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the prohibitively high data collection cost for cross-environment generalization in general-purpose imitation learning, this paper proposes a targeted data collection method based on Factorized Scaling Curves (FSC). Our approach introduces the first decomposable multi-factor scaling model that disentangles environmental variables—such as camera pose and table height—and quantifies both their individual and pairwise effects on policy performance. Under a fixed data budget, it optimally allocates samples across factor combinations using only offline metrics—eliminating the need for online evaluation in real environments. The method is compatible with both from-scratch training and fine-tuning paradigms, and facilitates efficient simulation-to-real transfer. Evaluated on real-robot tasks, our strategy achieves up to a 26% absolute improvement in success rate in novel environments, significantly outperforming random, uniform, and heuristic baselines.

Technology Category

Application Category

📝 Abstract
Generalist imitation learning policies trained on large datasets show great promise for solving diverse manipulation tasks. However, to ensure generalization to different conditions, policies need to be trained with data collected across a large set of environmental factor variations (e.g., camera pose, table height, distractors) $-$ a prohibitively expensive undertaking, if done exhaustively. We introduce a principled method for deciding what data to collect and how much to collect for each factor by constructing factored scaling curves (FSC), which quantify how policy performance varies as data scales along individual or paired factors. These curves enable targeted data acquisition for the most influential factor combinations within a given budget. We evaluate the proposed method through extensive simulated and real-world experiments, across both training-from-scratch and fine-tuning settings, and show that it boosts success rates in real-world tasks in new environments by up to 26% over existing data-collection strategies. We further demonstrate how factored scaling curves can effectively guide data collection using an offline metric, without requiring real-world evaluation at scale.
Problem

Research questions and friction points this paper is trying to address.

Determining optimal data collection for diverse environmental factors
Improving policy generalization without exhaustive data collection
Guiding data acquisition using performance-scaling curves efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Factored scaling curves guide data collection.
Targeted data acquisition for influential factors.
Offline metric avoids large real-world evaluation.
🔎 Similar Papers
No similar papers found.