🤖 AI Summary
This work addresses two critical challenges in generative modeling—particularly with normalizing flows: low inference efficiency and difficulty in practical deployment. To this end, we propose invertible 3×3 convolutions and a novel Quad coupling layer, design a parallel inversion algorithm, and introduce the first convolutional backward-propagation training mechanism for normalizing flows. Based on these innovations, we develop Affine-StableSR, a lightweight super-resolution model. At the application level, we achieve unsupervised geological feature extraction, conditional GAN–based seed purity assessment, and a privacy-preserving and artistic restoration framework integrating diffusion models. Our technical contributions span normalizing flows, conditional GANs, fine-tuned Stable Diffusion, and image inpainting. The proposed methods significantly improve both inference and training efficiency: Affine-StableSR reduces parameter count by 42% while attaining state-of-the-art accuracy and real-world deployability across agricultural quality inspection, geological mapping, autonomous driving data anonymization, and cultural heritage restoration.
📝 Abstract
This thesis presents novel contributions in two primary areas: advancing the efficiency of generative models, particularly normalizing flows, and applying generative models to solve real-world computer vision challenges. The first part introduce significant improvements to normalizing flow architectures through six key innovations: 1) Development of invertible 3x3 Convolution layers with mathematically proven necessary and sufficient conditions for invertibility, (2) introduction of a more efficient Quad-coupling layer, 3) Design of a fast and efficient parallel inversion algorithm for kxk convolutional layers, 4) Fast&efficient backpropagation algorithm for inverse of convolution, 5) Using inverse of convolution, in Inverse-Flow, for the forward pass and training it using proposed backpropagation algorithm, and 6) Affine-StableSR, a compact and efficient super-resolution model that leverages pre-trained weights and Normalizing Flow layers to reduce parameter count while maintaining performance. The second part: 1) An automated quality assessment system for agricultural produce using Conditional GANs to address class imbalance, data scarcity and annotation challenges, achieving good accuracy in seed purity testing; 2) An unsupervised geological mapping framework utilizing stacked autoencoders for dimensionality reduction, showing improved feature extraction compared to conventional methods; 3) We proposed a privacy preserving method for autonomous driving datasets using on face detection and image inpainting; 4) Utilizing Stable Diffusion based image inpainting for replacing the detected face and license plate to advancing privacy-preserving techniques and ethical considerations in the field.; and 5) An adapted diffusion model for art restoration that effectively handles multiple types of degradation through unified fine-tuning.