What We Don't C: Representations for scientific discovery beyond VAEs

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

High-dimensional scientific data often contain latent structures that are difficult to disentangle, and residual information—potentially encoding unmodeled physical signals—is frequently overlooked in generative modeling. Method: We propose a classifier-free-guided latent flow matching framework that explicitly decouples conditional information from residual representations, enabling interpretable extraction and controllable intervention of unmodeled scientific signals within generative models. By learning disentangled representations in latent space without label supervision, the method isolates semantically clear and physically interpretable features from the residual subspace. Results: Experiments on synthetic data, Colored MNIST, and the Galaxy10 astronomical dataset demonstrate substantial improvements in model analyzability and scientific interpretability. Our approach establishes a new paradigm for scientific discovery driven by high-dimensional data, bridging generative modeling with domain-informed, physics-aware analysis.

Technology Category

Application Category

📝 Abstract

Accessing information in learned representations is critical for scientific discovery in high-dimensional domains. We introduce a novel method based on latent flow matching with classifier-free guidance that disentangles latent subspaces by explicitly separating information included in conditioning from information that remains in the residual representation. Across three experiments -- a synthetic 2D Gaussian toy problem, colored MNIST, and the Galaxy10 astronomy dataset -- we show that our method enables access to meaningful features of high dimensional data. Our results highlight a simple yet powerful mechanism for analyzing, controlling, and repurposing latent representations, providing a pathway toward using generative models for scientific exploration of what we don't capture, consider, or catalog.

Problem

Research questions and friction points this paper is trying to address.

Developing disentangled latent representations for scientific data analysis

Accessing meaningful features in high-dimensional datasets for discovery

Enabling analysis and control of information in generative model representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent flow matching with classifier-free guidance

Disentangles latent subspaces by separating conditioning information

Analyzes and controls latent representations for scientific exploration

🔎 Similar Papers

Aligned at the Start: Conceptual Groupings in LLM Embeddings