LEARN: A Story-Driven Layout-to-Image Generation Framework for STEM Instruction

📅 2025-08-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in STEM education—including the difficulty of representing abstract concepts, attention fragmentation exacerbated by short-form media, and cognitively misaligned illustrations—this paper introduces the first layout-aware, narrative-driven diffusion model. The method formalizes page layout as a visual narrative scaffold, integrating semantic structure learning with cognitive scaffolding mechanisms. It employs layout-conditioned generation, contrastive vision–language alignment training, and prompt modulation to jointly optimize spatial organization, semantic coherence, and cognitive load in instructional illustrations. Evaluated on the BookCover dataset, the model significantly improves conceptual continuity and compatibility with curriculum knowledge graphs. This work establishes a unified generative modeling framework for AI-augmented educational content creation, advancing the integration of pedagogical principles into diffusion-based image synthesis.

Technology Category

Application Category

📝 Abstract
LEARN is a layout-aware diffusion framework designed to generate pedagogically aligned illustrations for STEM education. It leverages a curated BookCover dataset that provides narrative layouts and structured visual cues, enabling the model to depict abstract and sequential scientific concepts with strong semantic alignment. Through layout-conditioned generation, contrastive visual-semantic training, and prompt modulation, LEARN produces coherent visual sequences that support mid-to-high-level reasoning in line with Bloom's taxonomy while reducing extraneous cognitive load as emphasized by Cognitive Load Theory. By fostering spatially organized and story-driven narratives, the framework counters fragmented attention often induced by short-form media and promotes sustained conceptual focus. Beyond static diagrams, LEARN demonstrates potential for integration with multimodal systems and curriculum-linked knowledge graphs to create adaptive, exploratory educational content. As the first generative approach to unify layout-based storytelling, semantic structure learning, and cognitive scaffolding, LEARN represents a novel direction for generative AI in education. The code and dataset will be released to facilitate future research and practical deployment.
Problem

Research questions and friction points this paper is trying to address.

Generates pedagogically aligned STEM illustrations using layout-aware diffusion
Reduces cognitive load while supporting mid-to-high-level reasoning
Counters fragmented attention with story-driven, spatially organized narratives
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layout-aware diffusion framework for STEM education
Contrastive visual-semantic training for alignment
Integration with multimodal educational systems
🔎 Similar Papers
No similar papers found.
M
Maoquan Zhang
Graduate School of Advanced Science and Engineering, Hiroshima University, Hiroshima 739-8511, Japan
Bisser Raytchev
Bisser Raytchev
Hiroshima University, Department of Information Engineering
computer vision - machine learning - AI - image processing - brain-inspired computing - medical image analysis - high-dimensiona
X
Xiujuan Sun
Department of Computer Science, Weifang University of Science and Technology, Shouguang 262700, China