MANGO: Learning Disentangled Image Transformation Manifolds with Grouped Operators

📅 2024-09-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Learning semantically clear and disentangled geometric transformations—such as rotation, stroke thickness, and blur—from image examples remains challenging; existing Manifold Autoencoders (MAEs) suffer from poor operator disentanglement and prohibitively high training costs. Method: We propose the Grouped Lie Group Manifold Autoencoder (G-MAE), the first framework enabling user-specified semantic disentanglement of transformation operators. G-MAE models decoupled transformation manifolds in independent latent subspaces via grouped Lie group operators, supporting both compositional transformations and single-stage end-to-end training. Contribution/Results: Grounded in rigorous Lie group theory, G-MAE substantially improves semantic interpretability and synthesis controllability of learned transformations. It accelerates training by 100× over standard MAEs while preserving geometric fidelity. G-MAE establishes an efficient, interpretable paradigm for controllable image generation and unsupervised representation learning.

Technology Category

Application Category

📝 Abstract
Learning semantically meaningful image transformations (i.e. rotation, thickness, blur) directly from examples can be a challenging task. Recently, the Manifold Autoencoder (MAE) proposed using a set of Lie group operators to learn image transformations directly from examples. However, this approach has limitations, as the learned operators are not guaranteed to be disentangled and the training routine is prohibitively expensive when scaling up the model. To address these limitations, we propose MANGO (transformation Manifolds with Grouped Operators) for learning disentangled operators that describe image transformations in distinct latent subspaces. Moreover, our approach allows practitioners the ability to define which transformations they aim to model, thus improving the semantic meaning of the learned operators. Through our experiments, we demonstrate that MANGO enables composition of image transformations and introduces a one-phase training routine that leads to a 100x speedup over prior works.
Problem

Research questions and friction points this paper is trying to address.

Learning disentangled image transformation operators
Improving semantic meaning of learned transformations
Speeding up training routine for transformation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning disentangled operators in subspaces
User-defined transformations for semantic meaning
One-phase training with 100x speedup
🔎 Similar Papers
No similar papers found.
Brighton Ancelin
Brighton Ancelin
PhD Student, Georgia Institute of Technology
Machine Learning
Y
Yenho Chen
Georgia Institute of Technology
P
Peimeng Guan
Georgia Institute of Technology
C
Chiraag Kaushik
Georgia Institute of Technology
B
Belen Martin-Urcelay
Georgia Institute of Technology
Alex Saad-Falcon
Alex Saad-Falcon
ML PhD Student, Georgia Tech
machine learningcomputational physics
N
Nakul Singh
Georgia Institute of Technology