🤖 AI Summary
Existing PCGRL methods struggle to model human design intent, limiting their practical integration into creative workflows. To address this, we propose VIPCGRL—a novel framework that establishes, for the first time, a shared multimodal embedding space jointly representing textual descriptions, level layouts, and hand-drawn sketches. It employs quaternary contrastive learning and cross-modal style alignment to enable human-style-aware generative control. The method integrates deep reinforcement learning with multimodal semantic alignment and a similarity-based auxiliary reward mechanism, supporting joint policy optimization over heterogeneous inputs (text, sketch, layout). Experiments demonstrate that VIPCGRL significantly outperforms state-of-the-art baselines: it achieves a +23.6% improvement in style consistency score and attains an 87.4% expert preference rate in human evaluation. These results substantiate its effectiveness in enhancing human-AI collaboration and creative utility in procedural level generation.
📝 Abstract
Human-aligned AI is a critical component of co-creativity, as it enables models to accurately interpret human intent and generate controllable outputs that align with design goals in collaborative content creation. This direction is especially relevant in procedural content generation via reinforcement learning (PCGRL), which is intended to serve as a tool for human designers. However, existing systems often fall short of exhibiting human-centered behavior, limiting the practical utility of AI-driven generation tools in real-world design workflows. In this paper, we propose VIPCGRL (Vision-Instruction PCGRL), a novel deep reinforcement learning framework that incorporates three modalities-text, level, and sketches-to extend control modality and enhance human-likeness. We introduce a shared embedding space trained via quadruple contrastive learning across modalities and human-AI styles, and align the policy using an auxiliary reward based on embedding similarity. Experimental results show that VIPCGRL outperforms existing baselines in human-likeness, as validated by both quantitative metrics and human evaluations. The code and dataset will be available upon publication.