Commute Your Domains: Trajectory Optimality Criterion for Multi-Domain Learning

📅 2025-01-26

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Existing data mixing strategies in multi-domain learning lack theoretical foundations for determining optimal training order, hindering generalization across domains. Method: We introduce Lie bracket theory—previously unexplored in this context—to analyze multi-domain training dynamics. Specifically, we derive a trajectory optimality criterion based on the Lie bracket of gradient vector fields, characterizing local parameter-space regions where domain switching improves optimization. This establishes a differential-geometric link between training sequence and the geometry of the loss landscape. Our approach integrates gradient flow analysis, multi-task optimization theory, and empirical validation via bilingual large language model (LLM) pretraining. Results: Experiments on synthetic toy models and bilingual LLM pretraining confirm theoretical predictions, demonstrating significant gains in cross-domain generalization. The framework yields an interpretable, geometry-driven principle for data scheduling—offering the first theoretically grounded, differentially geometric design methodology for multi-domain training order.

Technology Category

Application Category

📝 Abstract

In multi-domain learning, a single model is trained on diverse data domains to leverage shared knowledge and improve generalization. The order in which the data from these domains is used for training can significantly affect the model's performance on each domain. However, this dependence is under-studied. In this paper, we investigate the influence of training order (or data mixing) in multi-domain learning using the concept of Lie bracket of gradient vector fields. By analyzing the infinitesimal effects of changing the training order, we identify regions in the parameter space where altering the order between two training domains can benefit the target loss. We validate the predictions of our theoretical framework on the influence of training order (or data mixing) both on a toy example and bilingual LLM pre-training.

Problem

Research questions and friction points this paper is trying to address.

Multi-domain Learning

Data Type Influence

Optimal Learning Path

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Vector Fields

Lie Bracket

Multi-domain Learning

🔎 Similar Papers

No similar papers found.