Meta-Modal Agent: Sequential Evidence Routing for Missing-Modality Candidate Reranking

📅 2026-05-24

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This work addresses the performance degradation of multimodal recommender systems under cold-start scenarios, where user interactions or modality data (e.g., text or visual) are often missing. To tackle this challenge, the authors propose MMA-Auto, the first approach that formulates missing modality handling as a large language model–driven sequential evidence routing problem. The method leverages reinforcement learning to optimize candidate pool reranking and introduces an interactive clarification mechanism as a diagnostic tool to estimate performance upper bounds. By integrating modality-masked training, an automated vision-language toolchain, and deterministic routing control, MMA-Auto achieves a 4.0% improvement in NDCG@10 under out-of-modality availability (OOMA) settings and a 12.7% gain in full-library reranking. The interactive variant further yields an additional 4.1% improvement toward the theoretical performance ceiling.

📝 Abstract

Missing modalities cause severe failures in multimodal recommender systems. User histories, item text, and visual evidence are frequently absent during cold-start scenarios, exactly when recommendation quality matters most. Existing approaches recover absent signals through imputation, feature propagation, or generative reconstruction, but these strategies can inject unsupported evidence when the surviving signals are weak. We introduce the Meta-Modal Agent (MMA), a large language model based candidate-pool reranker that treats missingness as a sequential evidence-routing problem. MMA is trained with balanced missingness-task reinforcement learning over masked-modality episodes and is evaluated in two variants: MMA-Auto, which uses only automated text, image, and graph tools, and MMA-Interactive, which additionally permits clarification questions grounded in surviving modalities as an upper-bound diagnostic. MMA operates after a first-stage retriever has produced a candidate pool; it scores those candidates rather than retrieving items from the full catalog. Final reranking fuses MMA scores with first-stage retrieval scores selected on validation data. Our evaluation is organized around four evidence checks required for a robust missing-modality claim: oracle-free one-observed-modality availability (OOMA) robustness, per-modality OOMA breakdowns, fixed-pool full-catalog reranking, and a deterministic-router mechanism control. MMA-Auto improves target-positive OOMA NDCG@10 by 4.0% and fixed-pool full-catalog reranking NDCG@10 by 12.7% over the strongest non-interactive baseline. RuleRouter-Fuse, which uses the same tools and fusion rule without learned policy updates, underperforms MMA-Auto, supporting learned routing beyond deterministic tool fusion. MMA-Interactive adds a 4.1% upper-bound gain when clarification is available.

Problem

Research questions and friction points this paper is trying to address.

missing modality

multimodal recommendation

cold-start

evidence routing

candidate reranking

Innovation

Methods, ideas, or system contributions that make the work stand out.

missing-modality

evidence routing

multimodal recommendation