🤖 AI Summary
In single-source domain generalization, data augmentation often induces severe out-of-distribution (OOD) performance fluctuations, primarily because models struggle to stably accumulate knowledge from diverse augmentations, leading to feature distortion. To address this, we propose Parameter-space Ensemble with Entropy Regularization (PEER): a master–proxy dual-model framework where mutual information-guided parameter averaging enables the proxy model to specialize in augmented samples while the master model progressively assimilates robust knowledge. Furthermore, we introduce mutual information maximization and entropy-constrained representation alignment to suppress feature distortion at the parameter-space level. Evaluated on PACS, Digits, Office-Home, and VLCS benchmarks, PEER achieves state-of-the-art OOD generalization using only simple random augmentation—significantly reducing performance variance compared to existing methods reliant on complex, hand-crafted augmentations.
📝 Abstract
Data augmentation is a popular tool for single source domain generalization, which expands the source domain by generating simulated ones, improving generalization on unseen target domains. In this work, we show that the performance of such augmentation-based methods in the target domains universally fluctuates during training, posing challenges in model selection under realistic scenarios. We argue that the fluctuation stems from the inability of the model to accumulate the knowledge learned from diverse augmentations, exacerbating feature distortion during training. Based on this observation, we propose a novel generalization method, coined Parameter-Space Ensemble with Entropy Regularization (PEER), that uses a proxy model to learn the augmented data on behalf of the main model. The main model is updated by averaging its parameters with the proxy model, progressively accumulating knowledge over the training steps. Maximizing the mutual information between the output representations of the two models guides the learning process of the proxy model, mitigating feature distortion during training. Experimental results demonstrate the effectiveness of PEER in reducing the OOD performance fluctuation and enhancing generalization across various datasets, including PACS, Digits, Office-Home, and VLCS. Notably, our method with simple random augmentation achieves state-of-the-art performance, surpassing prior approaches on sDG that utilize complex data augmentation strategies.