🤖 AI Summary
To address the scarcity of cross-domain annotations, poor generalizability, and low reproducibility of supervised methods in aspect-category sentiment analysis (ACSA), this paper proposes a zero-shot large language model (LLM) framework. Methodologically, it integrates multiple chain-of-thought agents and introduces, for the first time, a token-level uncertainty quantification mechanism—dynamically weighting agent outputs via uncertainty scores. Combined with chain-of-thought prompting and Llama/Qwen models (3B–70B+), it achieves fine-grained sentiment classification without labeled data. Contributions include: (1) the first application of token-level uncertainty to assess decision reliability in zero-shot sentiment classification, significantly mitigating performance degradation under domain shift; and (2) empirical validation across model scales demonstrating concurrent improvements in prediction stability and accuracy—establishing a robust, reproducible solution for low-resource ACSA.
📝 Abstract
Aspect-category sentiment analysis provides granular insights by identifying specific themes within product reviews that are associated with particular opinions. Supervised learning approaches dominate the field. However, data is scarce and expensive to annotate for new domains. We argue that leveraging large language models in a zero-shot setting is beneficial where the time and resources required for dataset annotation are limited. Furthermore, annotation bias may lead to strong results using supervised methods but transfer poorly to new domains in contexts that lack annotations and demand reproducibility. In our work, we propose novel techniques that combine multiple chain-of-thought agents by leveraging large language models' token-level uncertainty scores. We experiment with the 3B and 70B+ parameter size variants of Llama and Qwen models, demonstrating how these approaches can fulfil practical needs and opening a discussion on how to gauge accuracy in label-scarce conditions.