🤖 AI Summary
Fashion recommendation faces challenges including rapid trend evolution, implicit user preferences, complex outfit composition, and highly dynamic multi-stakeholder interactions (e.g., users, brands, influencers), rendering conventional static retrieval-based approaches inadequate. To address these, we propose the Agent-based Multimodal Recommendation (AMMR) framework: it employs LLM-powered agents endowed with reasoning and planning capabilities, integrating image-based anchors and textual constraints to support multi-turn dialogue and outfit generation; further, it unifies multimodal encoding, dynamic retrieval, and real-time inventory alignment for intent-driven generative recommendation. Experiments demonstrate significant improvements in cross-scenario intent understanding accuracy (+18.7%) and user satisfaction (NPS +23.4%), validating AMMR’s effectiveness in advancing fashion recommendation toward a dynamic, generative, and stakeholder-aware paradigm.
📝 Abstract
Fashion recommender systems (FaRS) face distinct challenges due to rapid trend shifts, nuanced user preferences, intricate item-item compatibility, and the complex interplay among consumers, brands, and influencers. Traditional recommendation approaches, largely static and retrieval-focused, struggle to effectively capture these dynamic elements, leading to decreased user satisfaction and elevated return rates. This paper synthesizes both academic and industrial viewpoints to map the distinctive output space and stakeholder ecosystem of modern FaRS, identifying the complex interplay among users, brands, platforms, and influencers, and highlighting the unique data and modeling challenges that arise.
We outline a research agenda for industrial FaRS, centered on five representative scenarios spanning static queries, outfit composition, and multi-turn dialogue, and argue that mixed-modality refinement-the ability to combine image-based references (anchors) with nuanced textual constraints-is a particularly critical task for real-world deployment. To this end, we propose an Agentic Mixed-Modality Refinement (AMMR) pipeline, which fuses multimodal encoders with agentic LLM planners and dynamic retrieval, bridging the gap between expressive user intent and fast-changing fashion inventories. Our work shows that moving beyond static retrieval toward adaptive, generative, and stakeholder-aware systems is essential to satisfy the evolving expectations of fashion consumers and brands.