Rakuten Data Release: A Large-Scale and Long-Term Reviews Corpus for Hotel Domain

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The hotel domain has long lacked large-scale, highly structured, multi-year review corpora. Method: We construct a fine-grained corpus spanning 2009–2024 (16 years), comprising 7.3 million user reviews and corresponding merchant responses—featuring unprecedented temporal breadth, comprehensive metadata (e.g., room type, traveler group, multi-dimensional ratings), and bidirectional user–merchant interaction. We propose a data distribution modeling framework coupled with time-series contrastive analysis to systematically identify key drivers of data drift between 2019–2024 (e.g., pandemic impacts, platform policy shifts). Contribution/Results: All data undergo rigorous anonymization and metadata standardization before open release. The corpus significantly advances NLP model training and evaluation, while uncovering stage-wise evolutionary patterns in user review behavior and platform response strategies.

Technology Category

Application Category

📝 Abstract
This paper presents a large-scale corpus of Rakuten Travel Reviews. Our collection contains 7.3 million customer reviews for 16 years, ranging from 2009 to 2024. Each record in the dataset contains the review text, its response from an accommodation, an anonymized reviewer ID, review date, accommodation ID, plan ID, plan title, room type, room name, purpose, accompanying group, and user ratings from different aspect categories, as well as an overall score. We present statistical information about our corpus and provide insights into factors driving data drift between 2019 and 2024 using statistical approaches.
Problem

Research questions and friction points this paper is trying to address.

Presents a large-scale hotel review corpus
Analyzes data drift factors over 16 years
Provides multi-aspect ratings and metadata insights
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale hotel review corpus with 7.3 million entries
Long-term data spanning 16 years from 2009 to 2024
Statistical analysis to identify factors causing data drift
🔎 Similar Papers
No similar papers found.
Y
Yuki Nakayama
Rakuten Institute of Technology, Rakuten Group, Inc.
K
Koki Hikichi
Travel & Mobility Business, Rakuten Group, Inc.
Y
Yun Ching Liu
Rakuten Institute of Technology, Rakuten Group, Inc.
Yu Hirate
Yu Hirate
Rakuten Institute of Technology, Rakuten, Inc
data mining