Beyond Stars: Bridging the Gap Between Ratings and Review Sentiment with LLM

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Traditional star ratings inadequately capture fine-grained sentiment and semantic nuances in app reviews, while general-purpose NLP methods suffer from limited modeling capacity for sarcasm, domain-specific terminology, and contextual sensitivity. To address these limitations, we propose a modular large language model (LLM)-based analytical framework that integrates structured prompt engineering with retrieval-augmented dialogue question-answering (RAG-QA) to achieve precise alignment between numerical ratings and textual sentiment. Our key contributions are: (1) an interpretable, structured prompt template that explicitly guides the LLM to identify sentiment polarity, intensity, and attribution dimensions; (2) a cross-review retrieval augmentation mechanism enhancing contextual robustness; and (3) support for fine-grained feature extraction and interactive exploration. Evaluated on AWARE, Google Play, and Spotify datasets, our method significantly outperforms state-of-the-art baselines, improving sentiment analysis accuracy by 8.2–14.7%, thereby delivering high-fidelity, actionable user feedback insights for app optimization.

Technology Category

Application Category

📝 Abstract

We present an advanced approach to mobile app review analysis aimed at addressing limitations inherent in traditional star-rating systems. Star ratings, although intuitive and popular among users, often fail to capture the nuanced feedback present in detailed review texts. Traditional NLP techniques -- such as lexicon-based methods and classical machine learning classifiers -- struggle to interpret contextual nuances, domain-specific terminology, and subtle linguistic features like sarcasm. To overcome these limitations, we propose a modular framework leveraging large language models (LLMs) enhanced by structured prompting techniques. Our method quantifies discrepancies between numerical ratings and textual sentiment, extracts detailed, feature-level insights, and supports interactive exploration of reviews through retrieval-augmented conversational question answering (RAG-QA). Comprehensive experiments conducted on three diverse datasets (AWARE, Google Play, and Spotify) demonstrate that our LLM-driven approach significantly surpasses baseline methods, yielding improved accuracy, robustness, and actionable insights in challenging and context-rich review scenarios.

Problem

Research questions and friction points this paper is trying to address.

Addressing limitations of star ratings in capturing nuanced feedback from reviews

Overcoming NLP techniques' struggles with contextual nuances and sarcasm

Quantifying discrepancies between numerical ratings and textual sentiment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging LLMs with structured prompting techniques

Quantifying discrepancies between ratings and textual sentiment

Using retrieval-augmented conversational QA for review exploration

🔎 Similar Papers

No similar papers found.