Blurb-Refined Inference from Crowdsourced Book Reviews using Hierarchical Genre Mining with Dual-Path Graph Convolutions

📅 2025-12-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing genre classification methods predominantly employ flat, single-label modeling, neglecting the inherent hierarchical structure of literary genres and over-relying on noisy, subjective user reviews—compromising reliability. This paper proposes HiGeMine, the first framework for hierarchical genre mining that jointly leverages authoritative book abstracts and noisy user reviews. It introduces a zero-shot semantic alignment filtering mechanism to enhance data quality and designs a dual-path graph convolutional network to simultaneously model genre hierarchy and label co-occurrence dependencies. The method integrates a pre-trained language model (BERT), a hierarchical label graph, and a cascaded binary-classification–multi-label architecture. Evaluated on a newly constructed hierarchical dataset, HiGeMine achieves 96.2% accuracy on Level-1 fiction/non-fiction discrimination and improves Level-2 fine-grained genre F1-score by 12.7% over baselines, demonstrating substantial robustness to label noise.

Technology Category

Application Category

📝 Abstract
Accurate book genre classification is fundamental to digital library organization, content discovery, and personalized recommendation. Existing approaches typically model genre prediction as a flat, single-label task, ignoring hierarchical genre structure and relying heavily on noisy, subjective user reviews, which often degrade classification reliability. We propose HiGeMine, a two-phase hierarchical genre mining framework that robustly integrates user reviews with authoritative book blurbs. In the first phase, HiGeMine employs a zero-shot semantic alignment strategy to filter reviews, retaining only those semantically consistent with the corresponding blurb, thereby mitigating noise, bias, and irrelevance. In the second phase, we introduce a dual-path, two-level graph-based classification architecture: a coarse-grained Level-1 binary classifier distinguishes fiction from non-fiction, followed by Level-2 multi-label classifiers for fine-grained genre prediction. Inter-genre dependencies are explicitly modeled using a label co-occurrence graph, while contextual representations are derived from pretrained language models applied to the filtered textual content. To facilitate systematic evaluation, we curate a new hierarchical book genre dataset. Extensive experiments demonstrate that HiGeMine consistently outperformed strong baselines across hierarchical genre classification tasks. The proposed framework offers a principled and effective solution for leveraging both structured and unstructured textual data in hierarchical book genre analysis.
Problem

Research questions and friction points this paper is trying to address.

Hierarchical book genre classification from noisy user reviews
Integrating user reviews with authoritative blurbs to reduce noise
Modeling genre dependencies for accurate multi-label classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot semantic alignment filters noisy user reviews
Dual-path graph-based architecture models hierarchical genre classification
Label co-occurrence graph captures inter-genre dependencies for predictions
S
Suraj Kumar
Indian Institute of Technology Indore
U
Utsav Kumar Nareti
Indian Institute of Technology Patna
S
Soumi Chattopadhyay
Indian Institute of Technology Indore
Chandranath Adak
Chandranath Adak
Indian Institute of Technology Patna
Computer VisionDeep LearningBiometricsData Analytics
P
Prolay Mallick
Indian Institute of Technology Indore