🤖 AI Summary
Conventional multi-component VAEs inadequately model complex inter-component dependencies—especially spatial or structural ones—in data such as industrial assembly sequences and multimodal imaging, due to oversimplified aggregation mechanisms that violate geometric and semantic consistency. Method: We propose GMRF-VAE, which explicitly incorporates a Gaussian Markov Random Field (GMRF) into both the prior and posterior distributions of a VAE to capture dynamic, structured dependencies among components. This replaces the standard static independence assumption with a learnable, graph-based dependency structure. Contribution/Results: To our knowledge, this is the first work integrating GMRFs into the probabilistic graphical structure of multi-component VAEs, yielding an interpretable and differentiable graph prior framework for complex dependency modeling. Experiments show state-of-the-art performance on synthetic Copula data, competitive results on PolyMNIST, and significant improvements in part coherence on real-world BIKED industrial assembly data (+23.6% structural similarity).
📝 Abstract
Multi-component datasets with intricate dependencies, like industrial assemblies or multi-modal imaging, challenge current generative modeling techniques. Existing Multi-component Variational AutoEncoders typically rely on simplified aggregation strategies, neglecting critical nuances and consequently compromising structural coherence across generated components. To explicitly address this gap, we introduce the Gaussian Markov Random Field Multi-Component Variational AutoEncoder , a novel generative framework embedding Gaussian Markov Random Fields into both prior and posterior distributions. This design choice explicitly models cross-component relationships, enabling richer representation and faithful reproduction of complex interactions. Empirically, our GMRF MCVAE achieves state-of-the-art performance on a synthetic Copula dataset specifically constructed to evaluate intricate component relationships, demonstrates competitive results on the PolyMNIST benchmark, and significantly enhances structural coherence on the real-world BIKED dataset. Our results indicate that the GMRF MCVAE is especially suited for practical applications demanding robust and realistic modeling of multi-component coherence