🤖 AI Summary
This work proposes AgenticRS, a self-evolving agentic recommendation architecture that reimagines traditional static multi-stage recommendation pipelines by restructuring system components into autonomous agents capable of decision-making and evolution. Addressing the challenges of efficiently optimizing under heterogeneous data and multi-objective constraints, AgenticRS is grounded in three core design principles: functional closure, independent evaluability, and evolvable decision spaces, while distinguishing between individual and compositional evolution mechanisms. The architecture leverages large language models to generate candidate structures and employs reinforcement learning with a hierarchical reward scheme—combining internal and external signals—to jointly optimize local and global objectives. This study offers a clear and practical pathway for transitioning recommendation systems from rigid pipelines to dynamic, self-evolving agentic paradigms.
📝 Abstract
Large-scale industrial recommenders typically use a fixed multi-stage pipeline (recall, ranking, re-ranking) and have progressed from collaborative filtering to deep and large pre-trained models. However, both multi-stage and so-called One Model designs remain essentially static: models are black boxes, and system improvement relies on manual hypotheses and engineering, which is hard to scale under heterogeneous data and multi-objective business constraints. We propose an Agentic Recommender System (AgenticRS) that reorganizes key modules as agents. Modules are promoted to agents only when they form a functionally closed loop, can be independently evaluated, and possess an evolvable decision space. For model agents, we outline two self-evolution mechanisms: reinforcement learning style optimization in well-defined action spaces, and large language model based generation and selection of new architectures and training schemes in open-ended design spaces. We further distinguish individual evolution of single agents from compositional evolution over how multiple agents are selected and connected, and use a layered inner and outer reward design to couple local optimization with global objectives. This provides a concise blueprint for turning static pipelines into self-evolving agentic recommender systems.