RDTF: Resource-efficient Dual-mask Training Framework for Multi-frame Animated Sticker Generation

📅 2025-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Lightweight multi-frame animated sticker generation (ASG) under resource constraints suffers from poor few-shot generalization and target-domain shift, especially when relying on parameter-efficient fine-tuning. Method: This work abandons parameter-efficient adaptation entirely and proposes training compact models from scratch. To address limited data fidelity and domain misalignment, we introduce a dual-mask data utilization strategy—preserving inter-frame coherence while enhancing local detail modeling—and a difficulty-adaptive curriculum learning framework that dynamically modulates training difficulty via static–adaptive entropy decomposition. Contribution/Results: Our method trains a discrete-frame generative network end-to-end on a million-scale few-shot dataset. It significantly outperforms state-of-the-art parameter-efficient approaches (e.g., I2V-Adapter, SimDA) on ASG in both quantitative metrics and visual quality. Crucially, it is the first to empirically validate that lightweight video generation models can achieve high-fidelity in-domain synthesis without any fine-tuning.

Technology Category

Application Category

📝 Abstract
Recently, great progress has been made in video generation technology, attracting the widespread attention of scholars. To apply this technology to downstream applications under resource-constrained conditions, researchers usually fine-tune the pre-trained models based on parameter-efficient tuning methods such as Adapter or Lora. Although these methods can transfer the knowledge from the source domain to the target domain, fewer training parameters lead to poor fitting ability, and the knowledge from the source domain may lead to the inference process deviating from the target domain. In this paper, we argue that under constrained resources, training a smaller video generation model from scratch using only million-level samples can outperform parameter-efficient tuning on larger models in downstream applications: the core lies in the effective utilization of data and curriculum strategy. Take animated sticker generation (ASG) as a case study, we first construct a discrete frame generation network for stickers with low frame rates, ensuring that its parameters meet the requirements of model training under constrained resources. In order to provide data support for models trained from scratch, we come up with a dual-mask based data utilization strategy, which manages to improve the availability and expand the diversity of limited data. To facilitate convergence under dual-mask situation, we propose a difficulty-adaptive curriculum learning method, which decomposes the sample entropy into static and adaptive components so as to obtain samples from easy to difficult. The experiment demonstrates that our resource-efficient dual-mask training framework is quantitatively and qualitatively superior to efficient-parameter tuning methods such as I2V-Adapter and SimDA, verifying the feasibility of our method on downstream tasks under constrained resources. Code will be available.
Problem

Research questions and friction points this paper is trying to address.

Efficient training for video generation under resource constraints
Improving data utilization with dual-mask strategy
Enhancing convergence via difficulty-adaptive curriculum learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mask strategy enhances limited data utility
Curriculum learning adapts to sample difficulty
Small model trained from scratch outperforms tuning
🔎 Similar Papers
No similar papers found.
Zhiqiang Yuan
Zhiqiang Yuan
fudan university
T
Ting Zhang
Pattern Recognition Center, WeChat AI, Tencent
Y
Ying Deng
Pattern Recognition Center, WeChat AI, Tencent
J
Jiapei Zhang
Pattern Recognition Center, WeChat AI, Tencent
Yeshuang Zhu
Yeshuang Zhu
WeChat - Basic Architecture Dept., Tencent Inc.
natural language processingimage/video generationhuman-computer interaction
Z
Zexi Jia
Pattern Recognition Center, WeChat AI, Tencent
J
Jie Zhou
Pattern Recognition Center, WeChat AI, Tencent
Jinchao Zhang
Jinchao Zhang
WeChat AI - Pattern Recognition Center
Deep LearningNatural Language ProcessingMachine TranslationDialogue System