Latent Zoning Network: A Unified Principle for Generative Modeling, Representation Learning, and Classification

📅 2025-09-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work unifies three core machine learning tasks—generative modeling, representation learning, and classification—within a single coherent framework. To this end, we propose the Latent Zoning Network (LZN), which constructs a shared Gaussian latent space and employs task-specific encoders to map heterogeneous data sources into mutually exclusive latent zones; a unified decoder enables joint cross-task modeling. LZN is the first architecture to natively support end-to-end co-optimization of unsupervised representation learning (without auxiliary losses), generative modeling, and supervised classification within one framework. Its modular design facilitates multi-task joint training without reliance on task-specific loss functions. Experiments demonstrate state-of-the-art performance: FID improves to 2.59 on CIFAR-10; linear evaluation on ImageNet achieves +9.3% and +0.2% top-1 accuracy over MoCo and SimCLR, respectively; and classification accuracy reaches new SOTA levels.

Technology Category

Application Category

📝 Abstract
Generative modeling, representation learning, and classification are three core problems in machine learning (ML), yet their state-of-the-art (SoTA) solutions remain largely disjoint. In this paper, we ask: Can a unified principle address all three? Such unification could simplify ML pipelines and foster greater synergy across tasks. We introduce Latent Zoning Network (LZN) as a step toward this goal. At its core, LZN creates a shared Gaussian latent space that encodes information across all tasks. Each data type (e.g., images, text, labels) is equipped with an encoder that maps samples to disjoint latent zones, and a decoder that maps latents back to data. ML tasks are expressed as compositions of these encoders and decoders: for example, label-conditional image generation uses a label encoder and image decoder; image embedding uses an image encoder; classification uses an image encoder and label decoder. We demonstrate the promise of LZN in three increasingly complex scenarios: (1) LZN can enhance existing models (image generation): When combined with the SoTA Rectified Flow model, LZN improves FID on CIFAR10 from 2.76 to 2.59-without modifying the training objective. (2) LZN can solve tasks independently (representation learning): LZN can implement unsupervised representation learning without auxiliary loss functions, outperforming the seminal MoCo and SimCLR methods by 9.3% and 0.2%, respectively, on downstream linear classification on ImageNet. (3) LZN can solve multiple tasks simultaneously (joint generation and classification): With image and label encoders/decoders, LZN performs both tasks jointly by design, improving FID and achieving SoTA classification accuracy on CIFAR10. The code and trained models are available at https://github.com/microsoft/latent-zoning-networks. The project website is at https://zinanlin.me/blogs/latent_zoning_networks.html.
Problem

Research questions and friction points this paper is trying to address.

Unifying generative modeling, representation learning, and classification tasks
Creating shared Gaussian latent space for disjoint ML solutions
Enhancing performance across image generation and classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Shared Gaussian latent space for all tasks
Disjoint latent zones for different data types
Task composition via encoder-decoder modules
🔎 Similar Papers
No similar papers found.