🤖 AI Summary
Traditional recommender systems suffer from passivity, staticity, and unimodal limitations, hindering adaptability in dynamic, multimodal environments. Method: This paper proposes the LLM-based Autonomous Recommender System (LLM-ARS), a novel embodied framework for the era of large-scale multimodal large language models (MLLMs). It systematically integrates multimodal perception, tool invocation, external knowledge retrieval, long-term memory modeling, and hierarchical autonomous planning to close the loop from environmental perception to intent inference and action decision-making. Contribution/Results: The core innovation is a “planning–memory–multimodal reasoning” co-enhancement paradigm that balances autonomy with controllability. The work establishes three key research directions—safe controllability, efficient inference, and lifelong personalization—demonstrating significant improvements in contextual adaptability, interactive proactivity, and long-term consistency of recommendations within dynamic multimodal scenarios, thereby providing both theoretical foundations and technical pathways for next-generation intelligent recommendation.
📝 Abstract
Recent breakthroughs in Large Language Models (LLMs) have led to the emergence of agentic AI systems that extend beyond the capabilities of standalone models. By empowering LLMs to perceive external environments, integrate multimodal information, and interact with various tools, these agentic systems exhibit greater autonomy and adaptability across complex tasks. This evolution brings new opportunities to recommender systems (RS): LLM-based Agentic RS (LLM-ARS) can offer more interactive, context-aware, and proactive recommendations, potentially reshaping the user experience and broadening the application scope of RS. Despite promising early results, fundamental challenges remain, including how to effectively incorporate external knowledge, balance autonomy with controllability, and evaluate performance in dynamic, multimodal settings. In this perspective paper, we first present a systematic analysis of LLM-ARS: (1) clarifying core concepts and architectures; (2) highlighting how agentic capabilities -- such as planning, memory, and multimodal reasoning -- can enhance recommendation quality; and (3) outlining key research questions in areas such as safety, efficiency, and lifelong personalization. We also discuss open problems and future directions, arguing that LLM-ARS will drive the next wave of RS innovation. Ultimately, we foresee a paradigm shift toward intelligent, autonomous, and collaborative recommendation experiences that more closely align with users' evolving needs and complex decision-making processes.