Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This study addresses the text comprehension challenges faced by individuals with intellectual disabilities by proposing an accessibility-oriented vision–language co-optimization method. To this end, we design five structured prompt templates aligned with Web Content Accessibility Guidelines (WCAG), which map simplified textual inputs to highly comprehensible images. We systematically investigate the interplay among visual style, data source, and semantic alignment. Experiments are conducted on a sentence-level dataset of 400 samples, evaluated via both CLIPScore-based automatic metrics and expert human annotation. Results indicate that the *Basic Object Focus* template achieves optimal semantic alignment; the *Retro* visual style significantly enhances image comprehensibility; and Wikipedia proves the most suitable data source for accessibility objectives. This work delivers the first reproducible prompt engineering framework and empirically grounded design guidelines for AI-driven accessible content generation.

Technology Category

Application Category

📝 Abstract

Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize aesthetics over accessibility, it is not clear how visual illustrations relate to text simplifications (TS) generated from them. This paper presents a structured vision-language model (VLM) prompting framework for generating accessible images from simplified texts. We designed five prompt templates, i.e., Basic Object Focus, Contextual Scene, Educational Layout, Multi-Level Detail, and Grid Layout, each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits, spatial separation, and content restrictions. Using 400 sentence-level simplifications from four established TS datasets (OneStopEnglish, SimPA, Wikipedia, and ASSET), we conducted a two-phase evaluation: Phase 1 assessed prompt template effectiveness with CLIPScores, and Phase 2 involved human annotation of generated images across ten visual styles by four accessibility experts. Results show that the Basic Object Focus prompt template achieved the highest semantic alignment, indicating that visual minimalism enhances language accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective data source. Inter-annotator agreement varied across dimensions, with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall, our framework offers practical guidelines for accessible content generation and underscores the importance of structured prompting in AI-generated visual accessibility tools.

Problem

Research questions and friction points this paper is trying to address.

Generating accessible images from simplified texts for intellectual disabilities

Evaluating prompt templates for visual-text alignment in accessibility tools

Assessing visual styles and data sources for optimal language accessibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structured VLM prompting framework for accessible images

Five prompt templates with distinct spatial arrangements

Basic Object Focus template achieves highest semantic alignment

🔎 Similar Papers

No similar papers found.