The 2nd EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

Large-scale multimodal foundation models face significant bottlenecks in training, deployment, and inference efficiency, hindering their practical application in information retrieval. To address this challenge, this work initiates the EReL@MIR workshop, focusing on efficient representation learning and adaptation techniques for multimodal retrieval in the foundation model era—exemplified by models such as CLIP, LLaVA, and Qwen. The initiative systematically examines efficiency challenges and evaluation criteria, promotes the development of novel efficiency metrics and benchmarks, and fosters a collaborative platform bridging academia and industry. By releasing relevant resources and encouraging community engagement, the project aims to accelerate progress toward scalable and efficient multimodal retrieval methodologies.

📝 Abstract

Multimodal representation learning has attracted increasing attention in AI, driven by the strong performance of large, pretrained multimodal foundation models such as Qwen, LLaVA, and CLIP. These models deliver impressive performance on a range of multimodal information retrieval (MIR) tasks, including web search, cross-modal retrieval, and recommender systems. Yet their massive parameter counts create major efficiency bottlenecks when adapting their representations for IR tasks during training, deployment, and inference. These limitations hinder the practical use of foundation models for representation learning in information retrieval. To address these issues, we propose organizing the EReL@MIR workshop at MM 2026, bringing together researchers from academia and industry to discuss emerging solutions, open challenges, and new efficiency metrics and benchmarks for multimodal IR representation learning in the foundation-model era. The workshop's official website is available at https://erel-mir.github.io/.

Problem

Research questions and friction points this paper is trying to address.

multimodal representation learning

information retrieval

foundation models

efficiency bottleneck

multimodal information retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

Efficient Representation Learning

Multimodal Information Retrieval

Foundation Models