Towards Personalized Bangla Book Recommendation: A Large-Scale Multi-Entity Book Graph Dataset

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the scarcity of structured, large-scale public datasets for personalized book recommendation in Bengali, a low-resource language. To bridge this gap, the authors construct RokomariBG—the first large-scale Bengali book knowledge graph dataset—encompassing multiple entity types (books, users, authors, categories, publishers) and relations, along with associated textual side information. Leveraging this resource, they systematically evaluate a range of recommendation approaches, including collaborative filtering, matrix factorization, content-based features, graph neural networks, and neural two-tower retrieval models. Experimental results demonstrate that the neural retrieval model achieves the best performance (NDCG@10 = 0.204), highlighting the critical role of both multi-relational graph structure and textual features in enhancing recommendation accuracy. This work establishes a new reproducible benchmark for recommender systems research in low-resource cultural contexts.

Technology Category

Application Category

📝 Abstract
Personalized book recommendation in Bangla literature has been constrained by the lack of structured, large-scale, and publicly available datasets. This work introduces RokomariBG, a large-scale, multi-entity heterogeneous book graph dataset designed to support research on personalized recommendation in a low-resource language setting. The dataset comprises 127,302 books, 63,723 users, 16,601 authors, 1,515 categories, 2,757 publishers, and 209,602 reviews, connected through eight relation types and organized as a comprehensive knowledge graph. To demonstrate the utility of the dataset, we provide a systematic benchmarking study on the Top-N recommendation task, evaluating a diverse set of representative recommendation models, including classical collaborative filtering methods, matrix factorization models, content-based approaches, graph neural networks, a hybrid matrix factorization model with side information, and a neural two-tower retrieval architecture. The benchmarking results highlight the importance of leveraging multi-relational structure and textual side information, with neural retrieval models achieving the strongest performance (NDCG@10 = 0.204). Overall, this work establishes a foundational benchmark and a publicly available resource for Bangla book recommendation research, enabling reproducible evaluation and future studies on recommendation in low-resource cultural domains. The dataset and code are publicly available at https://github.com/backlashblitz/Bangla-Book-Recommendation-Dataset
Problem

Research questions and friction points this paper is trying to address.

personalized recommendation
Bangla literature
low-resource language
book recommendation
dataset scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneous knowledge graph
low-resource language
personalized recommendation
graph neural networks
benchmark dataset
🔎 Similar Papers
No similar papers found.
R
Rahin Arefin Ahmed
East West University, Dhaka, Bangladesh
M
Md. Anik Chowdhury
East West University, Dhaka, Bangladesh
S
Sakil Ahmed Sheikh Reza
East West University, Dhaka, Bangladesh
D
Devnil Bhattacharjee
East West University, Dhaka, Bangladesh
Muhammad Abdullah Adnan
Muhammad Abdullah Adnan
Bangladesh University of Engineering and Technology (BUET), Dhaka, Bangladesh
Cloud ComputingDistributed ComputingDistributed Machine LearningArtificial IntelligenceNLP
Nafis Sadeq
Nafis Sadeq
Assistant Professor, East West University
Natural Language ProcessingRecommender SystemsSpeech Recognition