🤖 AI Summary
This work addresses the foundational yet underexplored problem of content-aware automated layout generation. We propose the first retrieval-augmented multi-agent framework for this task. Our method comprises a closed-loop pipeline with four stages: (1) multimodal layout example retrieval, (2) LLM-driven structured element recommendation, (3) vision-language joint scoring, and (4) feedback-guided iterative refinement—enabling synergistic optimization of text, icons, background images, and other elements in both semantic alignment and visual harmony. Implemented atop LangGraph, the framework ensures interpretability and high-fidelity output. Evaluated on the PKU PosterLayout dataset, our approach achieves state-of-the-art performance, significantly outperforming baselines such as LayoutPrompter across key metrics—including background validity, element alignment accuracy, and overlap suppression.
📝 Abstract
Automated content-aware layout generation -- the task of arranging visual elements such as text, logos, and underlays on a background canvas -- remains a fundamental yet under-explored problem in intelligent design systems. While recent advances in deep generative models and large language models (LLMs) have shown promise in structured content generation, most existing approaches lack grounding in contextual design exemplars and fall short in handling semantic alignment and visual coherence. In this work we introduce CAL-RAG, a retrieval-augmented, agentic framework for content-aware layout generation that integrates multimodal retrieval, large language models, and collaborative agentic reasoning. Our system retrieves relevant layout examples from a structured knowledge base and invokes an LLM-based layout recommender to propose structured element placements. A vision-language grader agent evaluates the layout with visual metrics, and a feedback agent provides targeted refinements, enabling iterative improvement. We implement our framework using LangGraph and evaluate it on the PKU PosterLayout dataset, a benchmark rich in semantic and structural variability. CAL-RAG achieves state-of-the-art performance across multiple layout metrics -- including underlay effectiveness, element alignment, and overlap -- substantially outperforming strong baselines such as LayoutPrompter. These results demonstrate that combining retrieval augmentation with agentic multi-step reasoning yields a scalable, interpretable, and high-fidelity solution for automated layout generation.