Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning

📅 2025-01-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing data mixing strategies in multi-domain learning lack theoretical foundations for determining optimal training order, hindering generalization across domains. Method: We introduce Lie bracket theory—previously unexplored in this context—to analyze multi-domain training dynamics. Specifically, we derive a trajectory optimality criterion based on the Lie bracket of gradient vector fields, characterizing local parameter-space regions where domain switching improves optimization. This establishes a differential-geometric link between training sequence and the geometry of the loss landscape. Our approach integrates gradient flow analysis, multi-task optimization theory, and empirical validation via bilingual large language model (LLM) pretraining. Results: Experiments on synthetic toy models and bilingual LLM pretraining confirm theoretical predictions, demonstrating significant gains in cross-domain generalization. The framework yields an interpretable, geometry-driven principle for data scheduling—offering the first theoretically grounded, differentially geometric design methodology for multi-domain training order.

Technology Category

Application Category

📝 Abstract
In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization. The order in which the data from these domains is used for training can significantly affect the model's performance on each domain. However, this dependence is under-studied. In this paper, we investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields. By analyzing the infinitesimal effects of changing the training order, we identify regions in the parameter space where altering the order between two training domains can benefit the target loss. We validate the predictions of our theoretical framework on the influence of training order (or data mixing) both on a toy example and bilingual LLM pre-training.
Problem

Research questions and friction points this paper is trying to address.

Multi-domain Learning
Data Type Influence
Optimal Learning Path
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Vector Fields
Lie Bracket
Multi-domain Learning
🔎 Similar Papers
No similar papers found.
A
Alexey Rukhovich
Noah’s Ark Lab
A
Alexander Podolskiy
Noah’s Ark Lab
Irina Piontkovskaya
Irina Piontkovskaya
Huawei Noah's Ark Lab
natural language processing