LeX-Art: Rethinking Text Generation via Scalable High-Quality Data Synthesis

📅 2025-03-27

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

To bridge the significant gap between prompt expressiveness and text rendering fidelity in text-to-image generation, this paper introduces LeX, a full-stack synthesis paradigm. Methodologically, it constructs LeX-10K—a high-fidelity Chinese text-image dataset; designs LeX-Enhancer, a prompt augmentation model; and develops two complementary generative architectures—LeX-FLUX (diffusion-based) and LeX-Lumina (flow-matching-based). Contributions include the first text precision metric, PNED (Prompt-Normalized Edit Distance), and the comprehensive evaluation benchmark LeX-Bench, integrating aesthetic modeling, spatially-aware alignment, and font rendering optimization. Experiments show LeX-Lumina achieves a 79.81% PNED improvement on CreateBench, while LeX-FLUX outperforms baselines by 3.18%, 4.45%, and 3.81% in color, positional, and font accuracy, respectively. All code, models, and data are publicly released.

Technology Category

Application Category

📝 Abstract

We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$ imes$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.

Problem

Research questions and friction points this paper is trying to address.

Bridging gap between prompt expressiveness and text rendering fidelity

Creating high-quality text-image synthesis dataset and models

Developing benchmark for systematic text generation evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

High-quality data synthesis pipeline using Deepseek-R1

Prompt enrichment model LeX-Enhancer for improved expressiveness

State-of-the-art text-to-image models LeX-FLUX and LeX-Lumina

🔎 Similar Papers

No similar papers found.