🤖 AI Summary
This work addresses the challenge of physics-informed statistical inference when experimental data are modeled via generative models. We propose the first general-purpose Generator-Based Inference (GBI) framework, extending Simulation-Based Inference (SBI) to data-driven generators—including Normalizing Flows and GANs—enabling binless, high-dimensional parameter estimation and anomaly detection. Methodologically, we introduce a sideband-learning strategy to construct background generators and integrate density-ratio estimation with Bayesian inference to ensure statistical interpretability. In resonance anomaly detection, GBI achieves significantly improved sensitivity and sets a new state-of-the-art on the LHCO benchmark. Crucially, its outputs—parameter posteriors and anomaly significance scores—carry direct statistical semantics. By decoupling inference from specific physics models, GBI establishes an interpretable, scalable paradigm for model-agnostic anomaly detection in high-energy physics and beyond.
📝 Abstract
Statistical inference in physics is often based on samples from a generator (sometimes referred to as a ``forward model") that emulate experimental data and depend on parameters of the underlying theory. Modern machine learning has supercharged this workflow to enable high-dimensional and unbinned analyses to utilize much more information than ever before. We propose a general framework for describing the integration of machine learning with generators called Generator Based Inference (GBI). A well-studied special case of this setup is Simulation Based Inference (SBI) where the generator is a physics-based simulator. In this work, we examine other methods within the GBI toolkit that use data-driven methods to build the generator. In particular, we focus on resonant anomaly detection, where the generator describing the background is learned from sidebands. We show how to perform machine learning-based parameter estimation in this context with data-derived generators. This transforms the statistical outputs of anomaly detection to be directly interpretable and the performance on the LHCO community benchmark dataset establishes a new state-of-the-art for anomaly detection sensitivity.