M-MRE: Extending the Mutual Reinforcement Effect to Multimodal Information Extraction

πŸ“… 2025-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Mutual Reinforcement Effect (MRE), previously studied only in text-based information extraction, has not been explored for visual and multimodal scenarios. Method: This paper introduces Multimodal Mutual Reinforcement Extraction (M-MRE), the first framework extending MRE to multimodal information extraction. We construct the first dedicated M-MRE benchmark for joint image-text understanding and propose a Prompt Format Adapter (PFA) enabling plug-and-play integration with diverse Large Vision-Language Models (LVLMs). Our approach unifies text extraction, image understanding, and cross-modal reasoning via multimodal joint modeling and cross-granularity three-task collaborative learning. Contribution/Results: Extensive experiments demonstrate significant performance gains across multiple downstream tasks, validating MRE’s effectiveness and generalizability in multimodal settings. M-MRE establishes a novel paradigm and scalable technical pathway for multimodal information extraction, advancing beyond unimodal MRE principles.

Technology Category

Application Category

πŸ“ Abstract
Mutual Reinforcement Effect (MRE) is an emerging subfield at the intersection of information extraction and model interpretability. MRE aims to leverage the mutual understanding between tasks of different granularities, enhancing the performance of both coarse-grained and fine-grained tasks through joint modeling. While MRE has been explored and validated in the textual domain, its applicability to visual and multimodal domains remains unexplored. In this work, we extend MRE to the multimodal information extraction domain for the first time. Specifically, we introduce a new task: Multimodal Mutual Reinforcement Effect (M-MRE), and construct a corresponding dataset to support this task. To address the challenges posed by M-MRE, we further propose a Prompt Format Adapter (PFA) that is fully compatible with various Large Vision-Language Models (LVLMs). Experimental results demonstrate that MRE can also be observed in the M-MRE task, a multimodal text-image understanding scenario. This provides strong evidence that MRE facilitates mutual gains across three interrelated tasks, confirming its generalizability beyond the textual domain.
Problem

Research questions and friction points this paper is trying to address.

Extends Mutual Reinforcement Effect to multimodal information extraction
Introduces Multimodal Mutual Reinforcement Effect (M-MRE) task
Proposes Prompt Format Adapter for Large Vision-Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends MRE to multimodal information extraction
Introduces Multimodal Mutual Reinforcement Effect (M-MRE)
Proposes Prompt Format Adapter for Large Vision-Language Models
πŸ”Ž Similar Papers
C
Chengguang Gan
Yokohama National University, Yokohama, Japan
S
Sunbowen Lee
College of Science, Wuhan University of Science and Technology, Wuhan, China
Zhixi Cai
Zhixi Cai
Research Fellow, Monash University
computer visiondeepfakemultimodalvisual reasoningllm agent
Y
Yanbin Wei
Southern University of Science and Technology, Shenzhen, China; Hong Kong University of Science and Technology, Hong Kong, China
L
Lei Zheng
Computer Science, Shanghai Jiao Tong University, Shanghai, China
Y
Yunhao Liang
University of Chinese Academy of Sciences, Beijing, China
S
Shiwen Ni
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
T
Tatsunori Mori
Yokohama National University, Yokohama, Japan