🤖 AI Summary
To bridge the significant gap between prompt expressiveness and text rendering fidelity in text-to-image generation, this paper introduces LeX, a full-stack synthesis paradigm. Methodologically, it constructs LeX-10K—a high-fidelity Chinese text-image dataset; designs LeX-Enhancer, a prompt augmentation model; and develops two complementary generative architectures—LeX-FLUX (diffusion-based) and LeX-Lumina (flow-matching-based). Contributions include the first text precision metric, PNED (Prompt-Normalized Edit Distance), and the comprehensive evaluation benchmark LeX-Bench, integrating aesthetic modeling, spatially-aware alignment, and font rendering optimization. Experiments show LeX-Lumina achieves a 79.81% PNED improvement on CreateBench, while LeX-FLUX outperforms baselines by 3.18%, 4.45%, and 3.81% in color, positional, and font accuracy, respectively. All code, models, and data are publicly released.
📝 Abstract
We introduce LeX-Art, a comprehensive suite for high-quality text-image synthesis that systematically bridges the gap between prompt expressiveness and text rendering fidelity. Our approach follows a data-centric paradigm, constructing a high-quality data synthesis pipeline based on Deepseek-R1 to curate LeX-10K, a dataset of 10K high-resolution, aesthetically refined 1024$ imes$1024 images. Beyond dataset construction, we develop LeX-Enhancer, a robust prompt enrichment model, and train two text-to-image models, LeX-FLUX and LeX-Lumina, achieving state-of-the-art text rendering performance. To systematically evaluate visual text generation, we introduce LeX-Bench, a benchmark that assesses fidelity, aesthetics, and alignment, complemented by Pairwise Normalized Edit Distance (PNED), a novel metric for robust text accuracy evaluation. Experiments demonstrate significant improvements, with LeX-Lumina achieving a 79.81% PNED gain on CreateBench, and LeX-FLUX outperforming baselines in color (+3.18%), positional (+4.45%), and font accuracy (+3.81%). Our codes, models, datasets, and demo are publicly available.