CyCLeGen: Cycle-Consistent Layout Prediction and Image Generation in Vision Foundation Models

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

Existing vision models typically rely on disjoint modules for image understanding and generation, hindering coherent reasoning and efficient learning within a unified architecture. This work proposes CyCLeGen, a unified vision-language foundation model that jointly models comprehension and generation capabilities through a cyclic image↔layout generation mechanism within a single autoregressive framework. By integrating cycle-consistency learning with reinforcement learning–driven synthetic supervision, the model acquires introspective abilities and achieves data-efficient self-improvement. Experiments demonstrate that CyCLeGen significantly outperforms current methods across multiple benchmarks for both image understanding and generation, thereby validating the effectiveness and potential of a unified architectural approach.

Technology Category

Application Category

📝 Abstract

We present CyCLeGen, a unified vision-language foundation model capable of both image understanding and image generation within a single autoregressive framework. Unlike existing vision models that depend on separate modules for perception and synthesis, CyCLeGen adopts a fully integrated architecture that enforces cycle-consistent learning through image->layout->image and layout->image->layout generation loops. This unified formulation introduces two key advantages: introspection, enabling the model to reason about its own generations, and data efficiency, allowing self-improvement via synthetic supervision under a reinforcement learning objective guided by cycle consistency. Extensive experiments show that CyCLeGen achieves significant gains across diverse image understanding and generation benchmarks, highlighting the potential of unified vision-language foundation models.

Problem

Research questions and friction points this paper is trying to address.

vision foundation models

image understanding

image generation

cycle consistency

unified architecture

Innovation

Methods, ideas, or system contributions that make the work stand out.

cycle-consistent learning

unified vision-language model

autoregressive generation