🤖 AI Summary
This study addresses the automated assessment of slide design flaws. We propose the first fine-grained, systematic taxonomy of presentation design defects and construct a high-quality annotated dataset comprising 2,400 slides, integrating collaborative expert labeling with crowdsourced fine-grained annotations. Leveraging this taxonomy, we design prompt strategies and conduct experiments using multimodal large language models (MLLMs) integrated with existing design evaluation pipelines. Results show that current MLLMs exhibit limited defect detection capability (F1 = 0.331–0.655), yet our taxonomy-informed approach significantly outperforms baselines. After AI-driven optimization, 82.0% of slides exhibited substantial quality improvement, with 87.8% of enhancements directly attributable to our taxonomy-guided prompting. Our core contribution is a principled, interpretable, and scalable framework for slide design defect assessment—establishing a novel paradigm for AI-augmented visual communication design.
📝 Abstract
Automated evaluation of specific graphic designs like presentation slides is an open problem. We present SlideAudit, a dataset for automated slide evaluation. We collaborated with design experts to develop a thorough taxonomy of slide design flaws. Our dataset comprises 2400 slides collected and synthesized from multiple sources, including a subset intentionally modified with specific design problems. We then fully annotated them using our taxonomy through strictly trained crowdsourcing from Prolific. To evaluate whether AI is capable of identifying design flaws, we compared multiple large language models under different prompting strategies, and with an existing design critique pipeline. We show that AI models struggle to accurately identify slide design flaws, with F1 scores ranging from 0.331 to 0.655. Notably, prompting techniques leveraging our taxonomy achieved the highest performance. We further conducted a remediation study to assess AI's potential for improving slides. Among 82.0% of slides that showed significant improvement, 87.8% of them were improved more with our taxonomy, further demonstrating its utility.