HotelMatch-LLM: Joint Multi-Task Training of Small and Large Language Models for Efficient Multimodal Hotel Retrieval

📅 2025-06-08

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Traditional travel search engines rely on structured inputs and struggle to support hotel retrieval via natural language queries. Method: This paper proposes a domain-specific multimodal dense retrieval model for travel. It introduces (1) a novel three-task joint optimization framework tailored to the travel domain; (2) an asymmetric cross-modal retrieval architecture that synergizes a small language model (for online query encoding) with a large language model (for offline hotel representation); and (3) full-gallery-level image feature aggregation and alignment. Results: Evaluated on four benchmark datasets, the model achieves Recall@10 of 0.681 on primary query types—outperforming the state-of-the-art MARVEL by 12.9% and significantly surpassing baselines such as VISTA.

Technology Category

Application Category

📝 Abstract

We present HotelMatch-LLM, a multimodal dense retrieval model for the travel domain that enables natural language property search, addressing the limitations of traditional travel search engines which require users to start with a destination and editing search parameters. HotelMatch-LLM features three key innovations: (1) Domain-specific multi-task optimization with three novel retrieval, visual, and language modeling objectives; (2) Asymmetrical dense retrieval architecture combining a small language model (SLM) for efficient online query processing and a large language model (LLM) for embedding hotel data; and (3) Extensive image processing to handle all property image galleries. Experiments on four diverse test sets show HotelMatch-LLM significantly outperforms state-of-the-art models, including VISTA and MARVEL. Specifically, on the test set -- main query type -- we achieve 0.681 for HotelMatch-LLM compared to 0.603 for the most effective baseline, MARVEL. Our analysis highlights the impact of our multi-task optimization, the generalizability of HotelMatch-LLM across LLM architectures, and its scalability for processing large image galleries.

Problem

Research questions and friction points this paper is trying to address.

Enables natural language hotel search without destination constraints

Combines small and large language models for efficient retrieval

Processes extensive property image galleries for multimodal search

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-specific multi-task optimization with novel objectives

Asymmetrical dense retrieval combining SLM and LLM

Extensive image processing for property galleries

🔎 Similar Papers

MMREC: LLM Based Multi-Modal Recommender System