🤖 AI Summary
This work addresses the challenge of recovering quantized factorial structure under nonlinear mappings in unsupervised disentanglement. We propose Cliff, a novel method that introduces axis-aligned, mutually independent discontinuities—termed “cliffs”—into the latent variable density. This is the first approach to translate theoretical identifiability of quantized disentanglement into an operational learning objective. Cliff leverages density estimation to detect sharp transitions in the latent space and employs a custom-designed objective that explicitly encourages dimension-wise independent discontinuous structures. Crucially, it achieves quantized identifiability of latent factors under arbitrary diffeomorphisms. Extensive evaluation on standard disentanglement benchmarks demonstrates that Cliff consistently outperforms state-of-the-art baselines, confirming both its effectiveness and strong generalization capability across diverse architectures and datasets.
📝 Abstract
Recent theoretical work established the unsupervised identifiability of quantized factors under any diffeomorphism. The theory assumes that quantization thresholds correspond to axis-aligned discontinuities in the probability density of the latent factors. By constraining a learned map to have a density with axis-aligned discontinuities, we can recover the quantization of the factors. However, translating this high-level principle into an effective practical criterion remains challenging, especially under nonlinear maps. Here, we develop a criterion for unsupervised disentanglement by encouraging axis-aligned discontinuities. Discontinuities manifest as sharp changes in the estimated density of factors and form what we call cliffs. Following the definition of independent discontinuities from the theory, we encourage the location of the cliffs along a factor to be independent of the values of the other factors. We show that our method, Cliff, outperforms the baselines on all disentanglement benchmarks, demonstrating its effectiveness in unsupervised disentanglement.