What We Don't C: Representations for scientific discovery beyond VAEs

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
High-dimensional scientific data often contain latent structures that are difficult to disentangle, and residual information—potentially encoding unmodeled physical signals—is frequently overlooked in generative modeling. Method: We propose a classifier-free-guided latent flow matching framework that explicitly decouples conditional information from residual representations, enabling interpretable extraction and controllable intervention of unmodeled scientific signals within generative models. By learning disentangled representations in latent space without label supervision, the method isolates semantically clear and physically interpretable features from the residual subspace. Results: Experiments on synthetic data, Colored MNIST, and the Galaxy10 astronomical dataset demonstrate substantial improvements in model analyzability and scientific interpretability. Our approach establishes a new paradigm for scientific discovery driven by high-dimensional data, bridging generative modeling with domain-informed, physics-aware analysis.

Technology Category

Application Category

📝 Abstract
Accessing information in learned representations is critical for scientific discovery in high-dimensional domains. We introduce a novel method based on latent flow matching with classifier-free guidance that disentangles latent subspaces by explicitly separating information included in conditioning from information that remains in the residual representation. Across three experiments -- a synthetic 2D Gaussian toy problem, colored MNIST, and the Galaxy10 astronomy dataset -- we show that our method enables access to meaningful features of high dimensional data. Our results highlight a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models for scientific exploration of what we don't capture, consider, or catalog.
Problem

Research questions and friction points this paper is trying to address.

Developing disentangled latent representations for scientific data analysis
Accessing meaningful features in high-dimensional datasets for discovery
Enabling analysis and control of information in generative model representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent flow matching with classifier-free guidance
Disentangles latent subspaces by separating conditioning information
Analyzes and controls latent representations for scientific exploration
🔎 Similar Papers
No similar papers found.
B
Brian Rogers
Oxford Astrophysics, University of Oxford
Micah Bowles
Micah Bowles
Schmidt AI in Science Fellow
RepresentationsSelf-Supervised LearningAstro
C
Chris J. Lintott
Oxford Astrophysics, University of Oxford
S
Steve Croft
Oxford Astrophysics, University of Oxford; Breakthrough Listen, University of California, Berkeley; SETI Institute