Enhancing Sentiment Classification and Irony Detection in Large Language Models through Advanced Prompt Engineering Techniques

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study addresses the challenge of large language models inadequately capturing subtle semantic distinctions in sentiment classification and sarcasm detection. It systematically evaluates and optimizes the performance of GPT-4o-mini and gemini-1.5-flash on sentiment analysis tasks using advanced prompting techniques, including few-shot prompting, chain-of-thought (CoT), and self-consistency. Comprehensive assessment is conducted via accuracy, recall, precision, and F1 score. The findings reveal that effective prompting strategies must be tailored to both model architecture and task complexity: GPT-4o-mini achieves optimal results under few-shot settings, while gemini-1.5-flash exhibits a 46% improvement in sarcasm detection with CoT prompting. This work provides practical guidance for leveraging lightweight large language models in fine-grained sentiment understanding.

Technology Category

Application Category

📝 Abstract

This study investigates the use of prompt engineering to enhance large language models (LLMs), specifically GPT-4o-mini and gemini-1.5-flash, in sentiment analysis tasks. It evaluates advanced prompting techniques like few-shot learning, chain-of-thought prompting, and self-consistency against a baseline. Key tasks include sentiment classification, aspect-based sentiment analysis, and detecting subtle nuances such as irony. The research details the theoretical background, datasets, and methods used, assessing performance of LLMs as measured by accuracy, recall, precision, and F1 score. Findings reveal that advanced prompting significantly improves sentiment analysis, with the few-shot approach excelling in GPT-4o-mini and chain-of-thought prompting boosting irony detection in gemini-1.5-flash by up to 46%. Thus, while advanced prompting techniques overall improve performance, the fact that few-shot prompting works best for GPT-4o-mini and chain-of-thought excels in gemini-1.5-flash for irony detection suggests that prompting strategies must be tailored to both the model and the task. This highlights the importance of aligning prompt design with both the LLM's architecture and the semantic complexity of the task.

Problem

Research questions and friction points this paper is trying to address.

sentiment classification

irony detection

large language models

prompt engineering

aspect-based sentiment analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

prompt engineering

few-shot learning

chain-of-thought prompting