🤖 AI Summary
This work investigates whether unsupervised domain adaptation (UDA) can improve the robustness of neural networks—trained solely on synthetic data—to real-world observations under simulation-to-reality domain shift, focusing on reliability in Bayesian inference. Method: We construct controllable low- and high-dimensional simulation benchmarks and systematically inject diverse domain mismatches—including unmodeled noise, model imperfections, and prior mismatch—and evaluate UDA methods that align both embedding and summary-statistic spaces. Contribution/Results: We find that UDA significantly enhances the robustness of simulation-based Approximate Bayesian Inference (ABI) to observational noise and model error; however, it degrades performance under prior mismatch—a previously unreported “mismatch-type sensitivity” of UDA. This work delineates the effective boundaries and failure modes of UDA in ABI, providing theoretical insights and practical guidelines for deploying simulation-trained models in scientific inference.
📝 Abstract
Neural networks are fragile when confronted with data that significantly deviates from their training distribution. This is true in particular for simulation-based inference methods, such as neural amortized Bayesian inference (ABI), where models trained on simulated data are deployed on noisy real-world observations. Recent robust approaches employ unsupervised domain adaptation (UDA) to match the embedding spaces of simulated and observed data. However, the lack of comprehensive evaluations across different domain mismatches raises concerns about the reliability in high-stakes applications. We address this gap by systematically testing UDA approaches across a wide range of misspecification scenarios in both a controlled and a high-dimensional benchmark. We demonstrate that aligning summary spaces between domains effectively mitigates the impact of unmodeled phenomena or noise. However, the same alignment mechanism can lead to failures under prior misspecifications - a critical finding with practical consequences. Our results underscore the need for careful consideration of misspecification types when using UDA techniques to increase the robustness of ABI in practice.