π€ AI Summary
This work proposes NeuroSQL, a novel deep generative framework that eliminates the need for auxiliary networks such as encoders or discriminators, thereby addressing common issues of training instability, high computational cost, and mode collapse in conventional models. By leveraging the linear assignment problem from optimal transport theory and introducing a quantile assignment mechanism, NeuroSQL implicitly constructs low-dimensional latent representations, which are then fed into an independent generator. The approach avoids explicit encoding and adversarial training altogether. Experiments on MNIST, CelebA, AFHQ, and OASIS demonstrate that NeuroSQL achieves superior image generation quality compared to VAEs, GANs, and diffusion models, while offering the fastest training speed. Notably, it maintains high performance even in few-shot settings, significantly enhancing stability, efficiency, and information preservation.
π Abstract
Deep Generative models (DGMs) play two key roles in modern machine learning: (i) producing new information (e.g., image synthesis) and (ii) reducing dimensionality. However, traditional architectures often rely on auxiliary networks such as encoders in Variational Autoencoders (VAEs) or discriminators in Generative Adversarial Networks (GANs), which introduce training instability, computational overhead, and risks like mode collapse. We present NeuroSQL, a new generative paradigm that eliminates the need for auxiliary networks by learning low-dimensional latent representations implicitly. NeuroSQL leverages an asymptotic approximation that expresses the latent variables as the solution to an optimal transportation problem. Specifically, NeuroSQL learns the latent variables by solving a linear assignment problem and then passes the latent information to a standalone generator. We benchmark its performance against GANs, VAEs, and a budget-matched diffusion baseline on four datasets: handwritten digits (MNIST), faces (CelebA), animal faces (AFHQ), and brain images (OASIS). Compared to VAEs, GANs, and diffusion models: (1) in terms of image quality, NeuroSQL achieves overall lower mean pixel distance between synthetic and authentic images and stronger perceptual/structural fidelity; (2) computationally, NeuroSQL requires the least training time; and (3) practically, NeuroSQL provides an effective solution for generating synthetic data with limited training samples. By embracing quantile assignment rather than an encoder, NeuroSQL provides a fast, stable, and robust way to generate synthetic data with minimal information loss.