MGAug: Multimodal Geometric Augmentation in Latent Spaces of Image Deformations

📅 2023-12-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing image geometric augmentation methods typically assume transformations follow unimodal distributions, limiting their ability to model the prevalent multimodal geometric variations observed in real-world data. To address this, we propose the first diffeomorphic geometric augmentation framework explicitly designed for multimodal deformation modeling. Our approach introduces a Gaussian mixture prior in the tangent space of deformation fields and integrates it with a variational autoencoder to enable differentiable, stochastic sampling from multimodal latent distributions. This design overcomes the restrictive unimodal assumption, substantially enhancing both expressivity and fidelity of geometric transformations. Experiments demonstrate significant improvements over state-of-the-art transformation-based augmentation methods on 2D synthetic image classification and 3D brain MRI segmentation tasks, validating the framework’s generalizability and practical effectiveness.

📝 Abstract

Geometric transformations have been widely used to augment the size of training images. Existing methods often assume a unimodal distribution of the underlying transformations between images, which limits their power when data with multimodal distributions occur. In this paper, we propose a novel model, Multimodal Geometric Augmentation (MGAug), that for the first time generates augmenting transformations in a multimodal latent space of geometric deformations. To achieve this, we first develop a deep network that embeds the learning of latent geometric spaces of diffeomorphic transformations (a.k.a. diffeomorphisms) in a variational autoencoder (VAE). A mixture of multivariate Gaussians is formulated in the tangent space of diffeomorphisms and serves as a prior to approximate the hidden distribution of image transformations. We then augment the original training dataset by deforming images using randomly sampled transformations from the learned multimodal latent space of VAE. To validate the efficiency of our model, we jointly learn the augmentation strategy with two distinct domain-specific tasks: multi-class classification on 2D synthetic datasets and segmentation on real 3D brain magnetic resonance images (MRIs). We also compare MGAug with state-of-the-art transformation-based image augmentation algorithms. Experimental results show that our proposed approach outperforms all baselines by significantly improved prediction accuracy. Our code is publicly available at https://github.com/tonmoy-hossain/MGAug.

Problem

Research questions and friction points this paper is trying to address.

Generates multimodal geometric augmentations for training images

Learns latent spaces of diffeomorphic transformations using VAE

Improves prediction accuracy in classification and segmentation tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal latent space for geometric deformations

VAE with mixture of Gaussians prior

Joint learning of augmentation and tasks

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)