Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models

📅 2025-03-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Traditional recommender systems suffer from passivity, staticity, and unimodal limitations, hindering adaptability in dynamic, multimodal environments. Method: This paper proposes the LLM-based Autonomous Recommender System (LLM-ARS), a novel embodied framework for the era of large-scale multimodal large language models (MLLMs). It systematically integrates multimodal perception, tool invocation, external knowledge retrieval, long-term memory modeling, and hierarchical autonomous planning to close the loop from environmental perception to intent inference and action decision-making. Contribution/Results: The core innovation is a “planning–memory–multimodal reasoning” co-enhancement paradigm that balances autonomy with controllability. The work establishes three key research directions—safe controllability, efficient inference, and lifelong personalization—demonstrating significant improvements in contextual adaptability, interactive proactivity, and long-term consistency of recommendations within dynamic multimodal scenarios, thereby providing both theoretical foundations and technical pathways for next-generation intelligent recommendation.

Technology Category

Application Category

📝 Abstract

Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportunities to recommender systems (RS): LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations, potentially reshaping the user experience and broadening the application scope of RS. Despite promising early results, fundamental challenges remain, including how to effectively incorporate external knowledge, balance autonomy with controllability, and evaluate performance in dynamic, multimodal settings. In this perspective paper, we first present a systematic analysis of LLM-ARS: (1) clarifying core concepts and architectures; (2) highlighting how agentic capabilities -- such as planning, memory, and multimodal reasoning -- can enhance recommendation quality; and (3) outlining key research questions in areas such as safety, efficiency, and lifelong personalization. We also discuss open problems and future directions, arguing that LLM-ARS will drive the next wave of RS innovation. Ultimately, we foresee a paradigm shift toward intelligent, autonomous, and collaborative recommendation experiences that more closely align with users' evolving needs and complex decision-making processes.

Problem

Research questions and friction points this paper is trying to address.

Enhancing recommender systems with autonomous, multimodal LLM agents

Balancing autonomy and controllability in agentic recommendation systems

Evaluating performance in dynamic, multimodal recommendation settings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrate multimodal information for recommendations

Enhance autonomy with planning and memory

Balance controllability in dynamic environments

🔎 Similar Papers

No similar papers found.

Authors to Follow