🤖 AI Summary
High-dimensional scientific data often contain latent structures that are difficult to disentangle, and residual information—potentially encoding unmodeled physical signals—is frequently overlooked in generative modeling. Method: We propose a classifier-free-guided latent flow matching framework that explicitly decouples conditional information from residual representations, enabling interpretable extraction and controllable intervention of unmodeled scientific signals within generative models. By learning disentangled representations in latent space without label supervision, the method isolates semantically clear and physically interpretable features from the residual subspace. Results: Experiments on synthetic data, Colored MNIST, and the Galaxy10 astronomical dataset demonstrate substantial improvements in model analyzability and scientific interpretability. Our approach establishes a new paradigm for scientific discovery driven by high-dimensional data, bridging generative modeling with domain-informed, physics-aware analysis.
📝 Abstract
Accessing information in learned representations is critical for scientific discovery in high-dimensional domains. We introduce a novel method based on latent flow matching with classifier-free guidance that disentangles latent subspaces by explicitly separating information included in conditioning from information that remains in the residual representation. Across three experiments -- a synthetic 2D Gaussian toy problem, colored MNIST, and the Galaxy10 astronomy dataset -- we show that our method enables access to meaningful features of high dimensional data. Our results highlight a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models for scientific exploration of what we don't capture, consider, or catalog.