Enhancing LLM-Based Text Classification in Political Science: Automatic Prompt Optimization and Dynamic Exemplar Selection for Few-Shot Learning

📅 2024-09-02

📈 Citations: 4

✨ Influential: 0

career value

159K/year

🤖 AI Summary

Large language models (LLMs) face challenges in political science text classification under few-shot settings—namely, heavy reliance on manual prompt engineering, static in-context example selection, and opaque, uninterpretable predictions. Method: This paper proposes a three-stage, fine-tuning-free framework: (1) task-driven automatic structured prompt generation; (2) query-aware dynamic KNN retrieval of semantically similar examples; and (3) multi-path output aggregation via weighted consensus, emulating collaborative coding by multiple annotators. It innovatively integrates meta-prompt engineering with consensus-based ensemble mechanisms. Contribution/Results: The open-source toolkit PoliPrompt achieves an average 12.7% accuracy gain over human-crafted prompts with fixed examples across sentiment analysis, stance detection, and campaign ad tone classification. It requires no training, operates out-of-the-box, and substantially reduces manual tuning effort while enhancing interpretability and robustness.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) offer substantial promise for text classification in political science, yet their effectiveness often depends on high-quality prompts and exemplars. To address this, we introduce a three-stage framework that enhances LLM performance through automatic prompt optimization, dynamic exemplar selection, and a consensus mechanism. Our approach automates prompt refinement using task-specific exemplars, eliminating speculative trial-and-error adjustments and producing structured prompts aligned with human-defined criteria. In the second stage, we dynamically select the most relevant exemplars, ensuring contextually appropriate guidance for each query. Finally, our consensus mechanism mimics the role of multiple human coders for a single task, combining outputs from LLMs to achieve high reliability and consistency at a reduced cost. Evaluated across tasks including sentiment analysis, stance detection, and campaign ad tone classification, our method enhances classification accuracy without requiring task-specific model retraining or extensive manual adjustments to prompts. This framework not only boosts accuracy, interpretability and transparency but also provides a cost-effective, scalable solution tailored to political science applications. An open-source Python package (PoliPrompt) is available on GitHub.

Problem

Research questions and friction points this paper is trying to address.

Optimizing prompts for political science text classification

Dynamically selecting relevant exemplars for few-shot learning

Enhancing classification accuracy without task-specific retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic prompt optimization using task-specific exemplars

Dynamic selection of relevant exemplars per query

Consensus mechanism combining multiple LLM outputs

🔎 Similar Papers

Prompt Selection Matters: Enhancing Text Annotations for Social Sciences with Large Language Models