MyGram: Modality-aware Graph Transformer with Global Distribution for Multi-modal Entity Alignment

πŸ“… 2026-01-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of existing multimodal entity alignment methods, which are often susceptible to superficial feature interference and overlook deep structural context within individual modalities. To overcome these issues, the authors propose MyGram, a novel model that integrates image and text semantics through a modality-aware graph Transformer and captures high-order intra-modal structural information via a modality diffusion learning module. Furthermore, they introduce a Gram Loss regularizer based on minimizing the volume of a four-dimensional parallelotope to enforce global cross-modal distributional consistency. Extensive experiments on five benchmark datasets demonstrate that MyGram significantly outperforms state-of-the-art approaches, achieving Hits@1 improvements of 4.8%, 9.9%, and 4.3% on FBDB15K, FBYG15K, and DBP15K, respectively.

Technology Category

Application Category

πŸ“ Abstract
Multi-modal entity alignment aims to identify equivalent entities between two multi-modal Knowledge graphs by integrating multi-modal data, such as images and text, to enrich the semantic representations of entities. However, existing methods may overlook the structural contextual information within each modality, making them vulnerable to interference from shallow features. To address these challenges, we propose MyGram, a modality-aware graph transformer with global distribution for multi-modal entity alignment. Specifically, we develop a modality diffusion learning module to capture deep structural contextual information within modalities and enable fine-grained multi-modal fusion. In addition, we introduce a Gram Loss that acts as a regularization constraint by minimizing the volume of a 4-dimensional parallelotope formed by multi-modal features, thereby achieving global distribution consistency across modalities. We conduct experiments on five public datasets. Results show that MyGram outperforms baseline models, achieving a maximum improvement of 4.8% in Hits@1 on FBDB15K, 9.9% on FBYG15K, and 4.3% on DBP15K.
Problem

Research questions and friction points this paper is trying to address.

multi-modal entity alignment
structural contextual information
shallow features
modality integration
semantic representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Modality-aware Graph Transformer
Modality Diffusion Learning
Gram Loss
Global Distribution Consistency
Multi-modal Entity Alignment
πŸ”Ž Similar Papers
No similar papers found.
Zhifei Li
Zhifei Li
Research Scientist at Google
machine translationnatural language processingmachine learningwireless networks
Z
Ziyue Qin
School of Computer Science, Hubei University, Wuhan 430062, China
X
Xiangyu Luo
School of Cyber Science and Technology, Hubei University, Wuhan 430062, China
X
Xiaoju Hou
Institute of Vocational Education, Guangdong Industry Polytechnic University, Guangzhou 510300, China
Y
Yue Zhao
Shandong Police College, Ji’nan 250200, China
M
Miao Zhang
School of Computer Science, Hubei University, Wuhan 430062, China
Z
Zhifang Huang
School of Computer Science, Hubei University, Wuhan 430062, China
K
Kui Xiao
School of Computer Science, Hubei University, Wuhan 430062, China
B
Bing Yang
School of Computer Science, Hubei University, Wuhan 430062, China