A Deep Generative Approach to Stratified Learning

📅 2026-04-12

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Complex data often reside in hierarchical spaces composed of manifolds of varying dimensions, posing significant challenges for conventional generative models due to dimensional variability, singular intersections, and the absence of efficient modeling mechanisms. This work proposes two deep generative frameworks: a dimension-aware mixture of variational autoencoders integrated with sieve maximum likelihood estimation, and a diffusion-based approach that learns the score field structure over hierarchical spaces. The study establishes, for the first time, theoretical convergence rates for deep generative models on such spaces, revealing intrinsic connections among geometric structure, ambient noise, and learning performance. It also introduces an algorithm capable of consistently estimating both the number of layers and their intrinsic dimensions. Theoretical analysis and experiments on synthetic and molecular dynamics data demonstrate that the proposed methods substantially outperform existing approaches in hierarchical structure recovery and sample generation quality.

Technology Category

Application Category

📝 Abstract

While the manifold hypothesis is widely adopted in modern machine learning, complex data is often better modeled as stratified spaces -- unions of manifolds (strata) of varying dimensions. Stratified learning is challenging due to varying dimensionality, intersection singularities, and lack of efficient models in learning the underlying distributions. We provide a deep generative approach to stratified learning by developing two generative frameworks for learning distributions on stratified spaces. The first is a sieve maximum likelihood approach realized via a dimension-aware mixture of variational autoencoders. The second is a diffusion-based framework that explores the score field structure of a mixture. We establish the convergence rates for learning both the ambient and intrinsic distributions, which are shown to be dependent on the intrinsic dimensions and smoothness of the underlying strata. Utilizing the geometry of the score field, we also establish consistency for estimating the intrinsic dimension of each stratum and propose an algorithm that consistently estimates both the number of strata and their dimensions. Theoretical results for both frameworks provide fundamental insights into the interplay of the underlying geometry, the ambient noise level, and deep generative models. Extensive simulations and real dataset applications, such as molecular dynamics, demonstrate the effectiveness of our methods.

Problem

Research questions and friction points this paper is trying to address.

stratified spaces

manifold hypothesis

dimensionality variation

intersection singularities

distribution learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

stratified learning

deep generative models

score-based diffusion