MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators

📅 2024-09-14

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Learning semantically clear and disentangled geometric transformations—such as rotation, stroke thickness, and blur—from image examples remains challenging; existing Manifold Autoencoders (MAEs) suffer from poor operator disentanglement and prohibitively high training costs. Method: We propose the Grouped Lie Group Manifold Autoencoder (G-MAE), the first framework enabling user-specified semantic disentanglement of transformation operators. G-MAE models decoupled transformation manifolds in independent latent subspaces via grouped Lie group operators, supporting both compositional transformations and single-stage end-to-end training. Contribution/Results: Grounded in rigorous Lie group theory, G-MAE substantially improves semantic interpretability and synthesis controllability of learned transformations. It accelerates training by 100× over standard MAEs while preserving geometric fidelity. G-MAE establishes an efficient, interpretable paradigm for controllable image generation and unsupervised representation learning.

Technology Category

Application Category

📝 Abstract

Learning semantically meaningful image transformations (i.e. rotation, thickness, blur) directly from examples can be a challenging task. Recently, the Manifold Autoencoder (MAE) proposed using a set of Lie group operators to learn image transformations directly from examples. However, this approach has limitations, as the learned operators are not guaranteed to be disentangled and the training routine is prohibitively expensive when scaling up the model. To address these limitations, we propose MANGO (transformation Manifolds with Grouped Operators) for learning disentangled operators that describe image transformations in distinct latent subspaces. Moreover, our approach allows practitioners the ability to define which transformations they aim to model, thus improving the semantic meaning of the learned operators. Through our experiments, we demonstrate that MANGO enables composition of image transformations and introduces a one-phase training routine that leads to a 100x speedup over prior works.

Problem

Research questions and friction points this paper is trying to address.

Learning disentangled image transformation operators

Improving semantic meaning of learned transformations

Speeding up training routine for transformation learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning disentangled operators in subspaces

User-defined transformations for semantic meaning

One-phase training with 100x speedup

🔎 Similar Papers

DualContrast: Unsupervised Disentangling of Content and Transformations with Implicit Parameterization