🤖 AI Summary
To address core challenges in LLM-driven semantic data processing—including user cognitive gaps, prompt inconsistency, and inefficient pipeline construction—this paper proposes a hybrid proactive IDE for semantic data processing. Our approach introduces three novel mechanisms: (1) in-situ user annotations to bridge the gap between user intent and model interpretation; (2) LLM-assisted prompt refinement to enhance prompt robustness; and (3) LLM-guided operation decomposition to support progressive programming with semantic operators (e.g., map, reduce, filter). The system integrates prompt engineering, interactive IDE design, and operation decomposition strategies. Through a 10-participant empirical study and analysis of 1,500+ real-world interaction sessions, we demonstrate significant improvements in users’ strategic evolution capabilities—specifically, their ability to transform open-ended tasks into verifiable classifiers and collaboratively explore data and model boundaries using ambiguous prompts.
📝 Abstract
Unstructured text has long been difficult to automatically analyze at scale. Large language models (LLMs) now offer a way forward by enabling {em semantic data processing}, where familiar data processing operators (e.g., map, reduce, filter) are powered by LLMs instead of code. However, building effective semantic data processing pipelines presents a departure from traditional data pipelines: users need to understand their data to write effective pipelines, yet they need to construct pipelines to extract the data necessary for that understanding -- all while navigating LLM idiosyncrasies and inconsistencies. We present docwrangler, a mixed-initiative integrated development environment (IDE) for semantic data processing with three novel features to address the gaps between the user, their data, and their pipeline: {em (i) In-Situ User Notes} that allows users to inspect, annotate, and track observations across documents and LLM outputs, {em (ii) LLM-Assisted Prompt Refinement} that transforms user notes into improved operations, and {em (iii) LLM-Assisted Operation Decomposition} that identifies when operations or documents are too complex for the LLM to correctly process and suggests decompositions. Our evaluation combines a think-aloud study with 10 participants and a public-facing deployment (available at href{https://docetl.org/playground}{docetl.org/playground}) with 1,500+ recorded sessions, revealing how users develop systematic strategies for their semantic data processing tasks; e.g., transforming open-ended operations into classifiers for easier validation and intentionally using vague prompts to learn more about their data or LLM capabilities.