LILaC: Late Interacting in Layered Component Graph for Open-domain Multimodal Multihop Retrieval

📅 2026-02-04

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 2

✨ Influential: 1

career value

187K/year

🤖 AI Summary

This work addresses the limitations of fixed single-granularity retrieval units in open-domain multimodal multi-hop retrieval, which introduce noise and struggle to capture cross-document semantic relationships. To overcome these challenges, the authors propose a hierarchical component graph structure that jointly models multimodal information at both coarse and fine granularities. They further design an edge-based late-interaction subgraph retrieval mechanism that first performs coarse-grained candidate filtering followed by fine-grained reasoning. This approach achieves state-of-the-art retrieval performance across all five benchmark datasets without requiring additional fine-tuning, effectively balancing computational efficiency with multi-hop reasoning accuracy.

Technology Category

Application Category

📝 Abstract

Multimodal document retrieval aims to retrieve query-relevant components from documents composed of textual, tabular, and visual elements. An effective multimodal retriever needs to handle two main challenges: (1) mitigate the effect of irrelevant contents caused by fixed, single-granular retrieval units, and (2) support multihop reasoning by effectively capturing semantic relationships among components within and across documents. To address these challenges, we propose LILaC, a multimodal retrieval framework featuring two core innovations. First, we introduce a layered component graph, explicitly representing multimodal information at two layers - each representing coarse and fine granularity - facilitating efficient yet precise reasoning. Second, we develop a late-interaction-based subgraph retrieval method, an edge-based approach that initially identifies coarse-grained nodes for efficient candidate generation, then performs fine-grained reasoning via late interaction. Extensive experiments demonstrate that LILaC achieves state-of-the-art retrieval performance on all five benchmarks, notably without additional fine-tuning. We make the artifacts publicly available at github.com/joohyung00/lilac.

Problem

Research questions and friction points this paper is trying to address.

multimodal retrieval

multihop reasoning

irrelevant content

semantic relationships

retrieval granularity

Innovation

Methods, ideas, or system contributions that make the work stand out.

layered component graph

late interaction

multimodal retrieval