From Raw Features to Effective Embeddings: A Three-Stage Approach for Multimodal Recipe Recommendation

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient exploitation of raw multimodal features—such as images, textual descriptions, and nutritional information—in recipe recommendation, this paper proposes TESMR, a three-stage collaborative enhancement framework. First, content-level multimodal semantic representations are extracted using foundation models (e.g., CLIP and BERT). Second, a user-recipe interaction graph is constructed to enable relation-level message propagation. Third, learnable embedding-based contrastive learning is introduced to refine cross-modal alignment and enhance representation discriminability. TESMR systematically integrates content understanding, structural modeling, and representation learning, thereby significantly improving embedding quality. Extensive experiments on two real-world datasets demonstrate that TESMR achieves 7–15% absolute gains in Recall@10 over state-of-the-art methods, validating its effectiveness and advancement in deep multimodal feature utilization.

Technology Category

Application Category

📝 Abstract
Recipe recommendation has become an essential task in web-based food platforms. A central challenge is effectively leveraging rich multimodal features beyond user-recipe interactions. Our analysis shows that even simple uses of multimodal signals yield competitive performance, suggesting that systematic enhancement of these signals is highly promising. We propose TESMR, a 3-stage framework for recipe recommendation that progressively refines raw multimodal features into effective embeddings through: (1) content-based enhancement using foundation models with multimodal comprehension, (2) relation-based enhancement via message propagation over user-recipe interactions, and (3) learning-based enhancement through contrastive learning with learnable embeddings. Experiments on two real-world datasets show that TESMR outperforms existing methods, achieving 7-15% higher Recall@10.
Problem

Research questions and friction points this paper is trying to address.

Effectively leveraging multimodal features for recipe recommendation systems
Progressively refining raw features into effective embeddings through three stages
Enhancing recommendation performance beyond basic user-recipe interactions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Three-stage framework refining raw multimodal features
Content enhancement using foundation models
Relation enhancement via message propagation
🔎 Similar Papers
No similar papers found.