🤖 AI Summary
Gradient-based domain generalization suffers from inconsistent and unstable inter-domain gradient directions, along with high computational overhead from second-order derivative approximations. To address these issues, this paper proposes Pareto-Optimal Gradient Matching (POGM), which models gradient trajectories as learnable signals and jointly optimizes them within a meta-learning framework: (i) maximizing inter-domain gradient inner products to enforce directional consistency, and (ii) constraining gradients to remain aligned with the empirical risk minimization direction to suppress oscillation. POGM employs first-order meta-updates to efficiently locate Pareto-optimal solutions, eliminating the need for costly second-order approximations. Evaluated on DomainBed, POGM achieves competitive generalization performance relative to state-of-the-art methods while reducing training cost by over 30%, demonstrating both effectiveness and practical efficiency.
📝 Abstract
In this study, we address the gradient-based domain generalization problem, where predictors aim for consistent gradient directions across different domains. Existing methods have two main challenges. First, minimization of gradient empirical distance or gradient inner products (GIP) leads to gradient fluctuations among domains, thereby hindering straightforward learning. Second, the direct application of gradient learning to the joint loss function can incur high computation overheads due to second-order derivative approximation. To tackle these challenges, we propose a new Pareto Optimality Gradient Matching (POGM) method. In contrast to existing methods that add gradient matching as regularization, we leverage gradient trajectories as collected data and apply independent training at the meta-learner. In the meta-update, we maximize GIP while limiting the learned gradient from deviating too far from the empirical risk minimization gradient trajectory. By doing so, the aggregate gradient can incorporate knowledge from all domains without suffering gradient fluctuation towards any particular domain. Experimental evaluations on datasets from DomainBed demonstrate competitive results yielded by POGM against other baselines while achieving computational efficiency.