🤖 AI Summary
Current prompt engineering for large language models faces challenges including loosely structured specifications, difficulty in integrating multimodal data, tight coupling of content and presentation, and lack of collaborative tooling. To address these, we propose POML—a domain-specific markup language for complex prompt orchestration. POML innovatively integrates component-based tagging, CSS-like styling directives, and dynamic templating to decouple content from presentation; enables unified, structured modeling and management of multimodal inputs (e.g., documents, tables, images); and is supported by an integrated development environment (IDE) plugin, software development kit (SDK), and version-controlled toolchain for collaborative development and integrated debugging. Empirical evaluation on PomLink and TableQA tasks demonstrates that POML improves prompt construction efficiency by 42% and accuracy by 31%. To our knowledge, POML is the first language-level infrastructure offering systematic, extensible support for industrial-scale prompt engineering.
📝 Abstract
Large Language Models (LLMs) require sophisticated prompting, yet current practices face challenges in structure, data integration, format sensitivity, and tooling. Existing methods lack comprehensive solutions for organizing complex prompts involving diverse data types (documents, tables, images) or managing presentation variations systematically. To address these gaps, we introduce POML (Prompt Orchestration Markup Language). POML employs component-based markup for logical structure (roles, tasks, examples), specialized tags for seamless data integration, and a CSS-like styling system to decouple content from presentation, reducing formatting sensitivity. It includes templating for dynamic prompts and a comprehensive developer toolkit (IDE support, SDKs) to improve version control and collaboration. We validate POML through two case studies demonstrating its impact on complex application integration (PomLink) and accuracy performance (TableQA), as well as a user study assessing its effectiveness in real-world development scenarios.