Multimodal Fine-grained Reasoning for Post Quality Evaluation

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing post quality assessment methods suffer from three key limitations: (1) unimodal modeling overlooks complementary multimodal cues; (2) deep multimodal fusion introduces modality-specific noise; and (3) they fail to capture complex semantic relationships—such as relevance and comprehensiveness—at fine-grained levels. To address these, we propose a fine-grained multimodal reasoning framework that reformulates the task as a multimodal ranking problem for the first time. Our approach introduces a maximum-information fusion mechanism guided by the information bottleneck principle to suppress modality noise, and integrates dual modules: (i) local–global attention for contextualized feature aggregation, and (ii) macro–micro evidence reasoning to emulate human cognitive processes for nuanced quality discrimination. The model is optimized end-to-end using ranking-aware objectives (e.g., NDCG). Extensive experiments demonstrate significant improvements over state-of-the-art methods across four benchmarks—achieving a 9.52% gain in NDCG@3 on the Art History dataset—validating both effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract
Accurately assessing post quality requires complex relational reasoning to capture nuanced topic-post relationships. However, existing studies face three major limitations: (1) treating the task as unimodal categorization, which fails to leverage multimodal cues and fine-grained quality distinctions; (2) introducing noise during deep multimodal fusion, leading to misleading signals; and (3) lacking the ability to capture complex semantic relationships like relevance and comprehensiveness. To address these issues, we propose the Multimodal Fine-grained Topic-post Relational Reasoning (MFTRR) framework, which mimics human cognitive processes. MFTRR reframes post-quality assessment as a ranking task and incorporates multimodal data to better capture quality variations. It consists of two key modules: (1) the Local-Global Semantic Correlation Reasoning Module, which models fine-grained semantic interactions between posts and topics at both local and global levels, enhanced by a maximum information fusion mechanism to suppress noise; and (2) the Multi-Level Evidential Relational Reasoning Module, which explores macro- and micro-level relational cues to strengthen evidence-based reasoning. We evaluate MFTRR on three newly constructed multimodal topic-post datasets and the public Lazada-Home dataset. Experimental results demonstrate that MFTRR significantly outperforms state-of-the-art baselines, achieving up to 9.52% NDCG@3 improvement over the best unimodal method on the Art History dataset.
Problem

Research questions and friction points this paper is trying to address.

Accurately assessing post quality with multimodal cues
Reducing noise in deep multimodal fusion processes
Capturing complex semantic relationships like relevance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal ranking task for quality assessment
Local-global semantic correlation reasoning module
Multi-level evidential relational reasoning module
🔎 Similar Papers
No similar papers found.
X
Xiaoxu Guo
S
Siyan Liang
Y
Yachao Cui
J
Juxiang Zhou
L
Lei Wang
Han Cao
Han Cao
Data Scientist at Inspectorio
Machine LearningDeep LearningData MiningNLPComputer Vision