FT-NCFM: An Influence-Aware Data Distillation Framework for Efficient VLA Models

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

Vision-Language-Action (VLA) models suffer from limited generalization due to heavy reliance on large-scale, redundant datasets; centralized optimization paradigms fail to fundamentally alleviate this data bottleneck. Method: We propose a data-centric generative distillation framework featuring (i) a novel fact-tracking engine that integrates causal attribution with programmable contrastive validation to enable interpretable, quantitative assessment of sample intrinsic value; (ii) an adversarial Neural Concept Formation Module (NCFM) for generating model-agnostic, information-dense, high-fidelity synthetic data; and (iii) construction of a minimal yet sufficient core dataset. Results: On mainstream VLA benchmarks, our distilled dataset—comprising only 5% of the original training data—achieves 85–90% of the full-data success rate, reduces training time by over 80%, and significantly enhances data efficiency and deployment feasibility.

Technology Category

Application Category

📝 Abstract

The powerful generalization of Vision-Language-Action (VLA) models is bottlenecked by their heavy reliance on massive, redundant, and unevenly valued datasets, hindering their widespread application. Existing model-centric optimization paths, such as model compression (which often leads to performance degradation) or policy distillation (whose products are model-dependent and lack generality), fail to fundamentally address this data-level challenge. To this end, this paper introduces FT-NCFM, a fundamentally different, data-centric generative data distillation framework. Our framework employs a self-contained Fact-Tracing (FT) engine that combines causal attribution with programmatic contrastive verification to assess the intrinsic value of samples. Guided by these assessments, an adversarial NCFM process synthesizes a model-agnostic, information-dense, and reusable data asset. Experimental results on several mainstream VLA benchmarks show that models trained on just 5% of our distilled coreset achieve a success rate of 85-90% compared with training on the full dataset, while reducing training time by over 80%. Our work demonstrates that intelligent data distillation is a highly promising new path for building efficient, high-performance VLA models.

Problem

Research questions and friction points this paper is trying to address.

Addresses data redundancy and inefficiency in Vision-Language-Action model training

Overcomes limitations of model compression and policy distillation methods

Creates compact, reusable datasets to maintain performance with reduced training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data distillation framework using causal attribution

Adversarial process synthesizes model-agnostic data asset

Fact-tracing engine assesses intrinsic sample value

🔎 Similar Papers

No similar papers found.