๐ค AI Summary
Algorithm selection for out-of-distribution (OOD) generalization remains largely empirical, relying on trial-and-error due to the absence of distributional priors and principled guidance. Method: This paper introduces the first learnable training algorithm selectorโtrained solely on dataset meta-features (e.g., statistical, spectral, and semantic properties)โto predict the optimal training algorithm across diverse distribution shifts. We formulate algorithm selection as a multi-label classification task and propose a unified evaluation framework integrating cross-domain meta-feature extraction with multi-source benchmarks (synthetic, vision, and language domains). Contribution/Results: Experiments demonstrate that the selector robustly identifies high-performing algorithms on unseen datasets and previously unobserved shift types, significantly outperforming validation-set tuning and heuristic strategies. Our work reveals learnable patterns governing algorithm applicability under distribution shift, establishing a new paradigm for co-designing and automating OOD generalization methods.
๐ Abstract
Out-of-distribution (OOD) generalization is challenging because distribution shifts come in many forms. Numerous algorithms exist to address specific settings, but choosing the right training algorithm for the right dataset without trial and error is difficult. Indeed, real-world applications often involve multiple types and combinations of shifts that are hard to analyze theoretically. Method. This work explores the possibility of learning the selection of a training algorithm for OOD generalization. We propose a proof of concept (OOD-Chameleon) that formulates the selection as a multi-label classification over candidate algorithms, trained on a dataset of datasets representing a variety of shifts. We evaluate the ability of OOD-Chameleon to rank algorithms on unseen shifts and datasets based only on dataset characteristics, i.e., without training models first, unlike traditional model selection. Findings. Extensive experiments show that the learned selector identifies high-performing algorithms across synthetic, vision, and language tasks. Further inspection shows that it learns non-trivial decision rules, which provide new insights into the applicability of existing algorithms. Overall, this new approach opens the possibility of better exploiting and understanding the plethora of existing algorithms for OOD generalization.