Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional star ratings inadequately capture fine-grained sentiment and semantic nuances in app reviews, while general-purpose NLP methods suffer from limited modeling capacity for sarcasm, domain-specific terminology, and contextual sensitivity. To address these limitations, we propose a modular large language model (LLM)-based analytical framework that integrates structured prompt engineering with retrieval-augmented dialogue question-answering (RAG-QA) to achieve precise alignment between numerical ratings and textual sentiment. Our key contributions are: (1) an interpretable, structured prompt template that explicitly guides the LLM to identify sentiment polarity, intensity, and attribution dimensions; (2) a cross-review retrieval augmentation mechanism enhancing contextual robustness; and (3) support for fine-grained feature extraction and interactive exploration. Evaluated on AWARE, Google Play, and Spotify datasets, our method significantly outperforms state-of-the-art baselines, improving sentiment analysis accuracy by 8.2–14.7%, thereby delivering high-fidelity, actionable user feedback insights for app optimization.

Technology Category

Application Category

📝 Abstract
We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems. Star ratings, although intuitive and popular among users, often fail to capture the nuanced feedback present in detailed review texts. Traditional NLP techniques -- such as lexicon-based methods and classical machine learning classifiers -- struggle to interpret contextual nuances, domain-specific terminology, and subtle linguistic features like sarcasm. To overcome these limitations, we propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques. Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews through retrieval-augmented conversational question answering (RAG-QA). Comprehensive experiments conducted on three diverse datasets (AWARE, Google Play, and Spotify) demonstrate that our LLM-driven approach significantly surpasses baseline methods, yielding improved accuracy, robustness, and actionable insights in challenging and context-rich review scenarios.
Problem

Research questions and friction points this paper is trying to address.

Addressing limitations of star ratings in capturing nuanced feedback from reviews
Overcoming NLP techniques' struggles with contextual nuances and sarcasm
Quantifying discrepancies between numerical ratings and textual sentiment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs with structured prompting techniques
Quantifying discrepancies between ratings and textual sentiment
Using retrieval-augmented conversational QA for review exploration
🔎 Similar Papers
No similar papers found.
N
Najla Zuhir
College of Computing & Information Technology, University of Doha for Science and Technology, Doha, Qatar
A
Amna Mohammad Salim
College of Computing & Information Technology, University of Doha for Science and Technology, Doha, Qatar
P
Parvathy Premkumar
College of Computing & Information Technology, University of Doha for Science and Technology, Doha, Qatar
Moshiur Farazi
Moshiur Farazi
University of Doha for Science and Technology, Australian National University
Computer VisionVision-Language ModelsApplied AI