The importance of visual modelling languages in generative software engineering

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the insufficient multimodal understanding capability in generative software engineering (SE). We present the first systematic investigation of GPT-4’s multimodal interface for integrating visual modeling languages—specifically UML class and sequence diagrams—with natural language prompts. We propose a novel prompting paradigm wherein visual modeling artifacts serve as a critical modality, and design a task-driven evaluation framework to empirically assess its effectiveness across three core SE tasks: requirements understanding, code generation, and architectural reasoning. Experimental results demonstrate that multimodal (image-text) prompting significantly outperforms text-only baselines, yielding an average accuracy improvement of 37%. This confirms the indispensable role of visual modalities in generative SE and fills a critical research gap concerning the systematic application of multimodal large language models in software engineering.

Technology Category

Application Category

📝 Abstract
Multimodal GPTs represent a watershed in the interplay between Software Engineering and Generative Artificial Intelligence. GPT-4 accepts image and text inputs, rather than simply natural language. We investigate relevant use cases stemming from these enhanced capabilities of GPT-4. To the best of our knowledge, no other work has investigated similar use cases involving Software Engineering tasks carried out via multimodal GPTs prompted with a mix of diagrams and natural language.
Problem

Research questions and friction points this paper is trying to address.

GPT-4
Software Engineering
Image-Text Integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4
Multimodal Capability
Software Engineering Tasks