Mixtraining: A Better Trade-Off Between Compute and Performance

📅 2025-02-26

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

To address computational redundancy, representation fragmentation, and imbalanced compute-accuracy trade-offs arising from sequential self-supervised learning (SSL) and supervised learning (SL) in data-constrained settings, this paper proposes an interleaved SSL/SL hybrid training framework. It dynamically alternates SSL and SL tasks within a single training pipeline, sharing gradient updates and parameter optimization paths, while incorporating dynamic loss weighting and scheduling. This is the first work to achieve unified, co-optimized SSL and SL on the ViT-Tiny architecture. On TinyImageNet, it achieves an absolute accuracy gain of 8.81% (18.89% relative improvement), accelerates training by 1.29×, and significantly advances the joint Pareto frontier of resource efficiency and model performance.

Technology Category

Application Category

📝 Abstract

Incorporating self-supervised learning (SSL) before standard supervised learning (SL) has become a widely used strategy to enhance model performance, particularly in data-limited scenarios. However, this approach introduces a trade-off between computation and performance: while SSL helps with representation learning, it requires a separate, often time-consuming training phase, increasing computational overhead and limiting efficiency in resource-constrained settings. To address these challenges, we propose MixTraining, a novel framework that interleaves several SSL and SL epochs within a unified mixtraining training phase, featuring a smooth transition between two learning objectives. MixTraining enhances synergy between SSL and SL for improved accuracy and consolidates shared computation steps to reduce computation overhead. MixTraining is versatile and applicable to both single-task and multi-task learning scenarios. Extensive experiments demonstrate that MixTraining offers a superior compute-performance trade-off compared to conventional pipelines, achieving an 8.81% absolute accuracy gain (18.89% relative accuracy gain) on the TinyImageNet dataset while accelerating training by up to 1.29x with the ViT-Tiny model.

Problem

Research questions and friction points this paper is trying to address.

Balances computation and performance

Integrates SSL and SL efficiently

Reduces training overhead effectively

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates SSL and SL epochs

Reduces computational overhead

Enhances model accuracy significantly

🔎 Similar Papers

No similar papers found.