CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

📅 2026-03-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing vision models typically rely on disjoint modules for image understanding and generation, hindering coherent reasoning and efficient learning within a unified architecture. This work proposes CyCLeGen, a unified vision-language foundation model that jointly models comprehension and generation capabilities through a cyclic image↔layout generation mechanism within a single autoregressive framework. By integrating cycle-consistency learning with reinforcement learning–driven synthetic supervision, the model acquires introspective abilities and achieves data-efficient self-improvement. Experiments demonstrate that CyCLeGen significantly outperforms current methods across multiple benchmarks for both image understanding and generation, thereby validating the effectiveness and potential of a unified architectural approach.

Technology Category

Application Category

📝 Abstract
We present CyCLeGen, a unified vision-language foundation model capable of both image understanding and image generation within a single autoregressive framework. Unlike existing vision models that depend on separate modules for perception and synthesis, CyCLeGen adopts a fully integrated architecture that enforces cycle-consistent learning through image->layout->image and layout->image->layout generation loops. This unified formulation introduces two key advantages: introspection, enabling the model to reason about its own generations, and data efficiency, allowing self-improvement via synthetic supervision under a reinforcement learning objective guided by cycle consistency. Extensive experiments show that CyCLeGen achieves significant gains across diverse image understanding and generation benchmarks, highlighting the potential of unified vision-language foundation models.
Problem

Research questions and friction points this paper is trying to address.

vision foundation models
image understanding
image generation
cycle consistency
unified architecture
Innovation

Methods, ideas, or system contributions that make the work stand out.

cycle-consistent learning
unified vision-language model
autoregressive generation
layout prediction
self-improvement
🔎 Similar Papers
No similar papers found.