Automatic Text Box Placement for Supporting Typographic Design

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the automatic text box placement problem in incomplete layouts for advertising and web design, aiming to jointly optimize visual appeal and information delivery efficiency. We propose and systematically evaluate four approaches: a standard Transformer, a lightweight vision-language model (Phi-3.5-vision), a large multimodal model (Gemini), and an extended Transformer supporting multi-image input. Experiments on the Crello dataset demonstrate that the task-specific Transformer achieves the best overall performance—particularly excelling in modeling complex visual appearances—and significantly outperforms vision-language model baselines. Our findings highlight the critical role of explicit appearance modeling in layout generation. This study establishes a new, interpretable, and highly adaptable paradigm for automated typographic layout design. Furthermore, it identifies key limitations of current methods in scenarios involving small-scale text and high-density layouts.

Technology Category

Application Category

📝 Abstract

In layout design for advertisements and web pages, balancing visual appeal and communication efficiency is crucial. This study examines automated text box placement in incomplete layouts, comparing a standard Transformer-based method, a small Vision and Language Model (Phi3.5-vision), a large pretrained VLM (Gemini), and an extended Transformer that processes multiple images. Evaluations on the Crello dataset show the standard Transformer-based models generally outperform VLM-based approaches, particularly when incorporating richer appearance information. However, all methods face challenges with very small text or densely populated layouts. These findings highlight the benefits of task-specific architectures and suggest avenues for further improvement in automated layout design.

Problem

Research questions and friction points this paper is trying to address.

Automating text box placement in incomplete layout designs

Comparing Transformer and Vision-Language Model performance

Addressing challenges with small text and dense layouts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based models outperform Vision-Language Models

Extended Transformer processes multiple images for layout

Task-specific architectures enhance automated text placement

🔎 Similar Papers

TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation