Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-Free Multimodal Recommendation

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
Existing ID-agnostic multimodal recommendation methods struggle to capture long-tail semantic relationships due to static identity representations, insufficient exploitation of multimodal semantics, and popularity bias contaminating graph learning. To address these limitations, this work proposes MAIL, a novel framework that dynamically integrates multimodal semantics with positional encodings through a modality-aware identity construction module, yielding content-aware identity representations. Furthermore, MAIL introduces a counterfactual reasoning–based graph structure learning mechanism that incorporates a popularity penalty to uncover semantically relevant yet low-exposure neighbors, thereby mitigating popularity bias. Extensive experiments on five Amazon datasets demonstrate that MAIL achieves significant improvements over state-of-the-art baselines, with average gains of 7.81% in Recall@10 and 12.81% in NDCG@10.
📝 Abstract
Multimodal recommendation has attracted extensive attention by leveraging heterogeneous modality information to alleviate data sparsity and improve recommendation accuracy. Existing methods have attempted to replace ID embeddings with multimodal features and have achieved promising preliminary results. However, these methods still exhibit the following two limitations: (1) the reconstructed ID representations remain relatively static and fail to fully exploit multimodal semantics; and (2) the graph learning process is insufficient in mining latent long-tail semantic relations and is easily affected by popularity bias. To address these issues, we propose a novel method named Modality-Aware Identity Construction and Counterfactual Structure Learning for ID-free Multimodal Recommendation (MAIL). Specifically, we design a modality-aware identity construction module that dynamically modulates positional encodings with multimodal semantics to construct content-aware ID-free identity representations. Then, we propose a counterfactual structure learning paradigm that mines low-exposure semantic neighbors via popularity penalization and alleviates popularity bias. Extensive experiments are conducted on five public Amazon datasets. Experimental results show that MAIL achieves average improvements of 7.81% in Recall@10 and 12.81% in NDCG@10 compared with the baseline models. Our code is available at https://github.com/HubuKG/MAIL.
Problem

Research questions and friction points this paper is trying to address.

multimodal recommendation
ID-free representation
popularity bias
long-tail semantic relations
modality semantics
Innovation

Methods, ideas, or system contributions that make the work stand out.

modality-aware identity construction
counterfactual structure learning
ID-free recommendation
popularity bias mitigation
multimodal semantics
🔎 Similar Papers
No similar papers found.
H
Hongjian Ma
School of Computer Science, Hubei University, Wuhan 430062, China; Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China; Key Laboratory of Intelligent Sensing System and Security (Ministry of Education), Hubei University, Wuhan 430062, China
W
Wenxin Huang
School of Computer Science, Hubei University, Wuhan 430062, China; Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China; Key Laboratory of Intelligent Sensing System and Security (Ministry of Education), Hubei University, Wuhan 430062, China
Y
Yan Zhang
School of Computer Science, Hubei University, Wuhan 430062, China; Hubei Key Laboratory of Big Data Intelligent Analysis and Application, Hubei University, Wuhan 430062, China; Key Laboratory of Intelligent Sensing System and Security (Ministry of Education), Hubei University, Wuhan 430062, China
Zhifei Li
Zhifei Li
Research Scientist at Google
machine translationnatural language processingmachine learningwireless networks
Zheng Wang
Zheng Wang
Wuhan University
Multimedia Content AnalysisComputer VisionArtificial Intelligence