๐ค AI Summary
Large language models (LLMs) face challenges in political science text classification under few-shot settingsโnamely, heavy reliance on manual prompt engineering, static in-context example selection, and opaque, uninterpretable predictions.
Method: This paper proposes a three-stage, fine-tuning-free framework: (1) task-driven automatic structured prompt generation; (2) query-aware dynamic KNN retrieval of semantically similar examples; and (3) multi-path output aggregation via weighted consensus, emulating collaborative coding by multiple annotators. It innovatively integrates meta-prompt engineering with consensus-based ensemble mechanisms.
Contribution/Results: The open-source toolkit PoliPrompt achieves an average 12.7% accuracy gain over human-crafted prompts with fixed examples across sentiment analysis, stance detection, and campaign ad tone classification. It requires no training, operates out-of-the-box, and substantially reduces manual tuning effort while enhancing interpretability and robustness.
๐ Abstract
Large language models (LLMs) offer substantial promise for text classification in political science, yet their effectiveness often depends on high-quality prompts and exemplars. To address this, we introduce a three-stage framework that enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. Our approach automates prompt refinement using task-specific exemplars, eliminating speculative trial-and-error adjustments and producing structured prompts aligned with human-defined criteria. In the second stage, we dynamically select the most relevant exemplars, ensuring contextually appropriate guidance for each query. Finally, our consensus mechanism mimics the role of multiple human coders for a single task, combining outputs from LLMs to achieve high reliability and consistency at a reduced cost. Evaluated across tasks including sentiment analysis, stance detection, and campaign ad tone classification, our method enhances classification accuracy without requiring task-specific model retraining or extensive manual adjustments to prompts. This framework not only boosts accuracy, interpretability and transparency but also provides a cost-effective, scalable solution tailored to political science applications. An open-source Python package (PoliPrompt) is available on GitHub.