Closing the Loop: Coordinating Inventory and Recommendation via Deep Reinforcement Learning on Multiple Timescales

📅 2025-10-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the suboptimal overall profitability arising from the functional silo between inventory replenishment and personalized recommendation. We propose a multi-timescale multi-agent deep reinforcement learning (MARL) framework. Methodologically, we design heterogeneous agents via functional decoupling—endowing each with distinct update frequencies and policy spaces—and employ a model-agnostic MARL algorithm, supported by theoretical analysis and simulation-based validation to ensure convergence and scalability. Our key contributions are: (i) the first cross-functional, multi-timescale coordinated decision-making architecture, breaking away from conventional isolated optimization paradigms; and (ii) empirical demonstration of significant profitability improvement, with emergent agent policies aligning closely with classical operations management insights—thereby achieving both theoretical rigor and practical deployability.

Technology Category

Application Category

📝 Abstract
Effective cross-functional coordination is essential for enhancing firm-wide profitability, particularly in the face of growing organizational complexity and scale. Recent advances in artificial intelligence, especially in reinforcement learning (RL), offer promising avenues to address this fundamental challenge. This paper proposes a unified multi-agent RL framework tailored for joint optimization across distinct functional modules, exemplified via coordinating inventory replenishment and personalized product recommendation. We first develop an integrated theoretical model to capture the intricate interplay between these functions and derive analytical benchmarks that characterize optimal coordination. The analysis reveals synchronized adjustment patterns across products and over time, highlighting the importance of coordinated decision-making. Leveraging these insights, we design a novel multi-timescale multi-agent RL architecture that decomposes policy components according to departmental functions and assigns distinct learning speeds based on task complexity and responsiveness. Our model-free multi-agent design improves scalability and deployment flexibility, while multi-timescale updates enhance convergence stability and adaptability across heterogeneous decisions. We further establish the asymptotic convergence of the proposed algorithm. Extensive simulation experiments demonstrate that the proposed approach significantly improves profitability relative to siloed decision-making frameworks, while the behaviors of the trained RL agents align closely with the managerial insights from our theoretical model. Taken together, this work provides a scalable, interpretable RL-based solution to enable effective cross-functional coordination in complex business settings.
Problem

Research questions and friction points this paper is trying to address.

Coordinating inventory replenishment with personalized recommendation systems
Optimizing cross-functional decisions via multi-agent reinforcement learning
Improving profitability through synchronized multi-timescale decision frameworks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent RL framework for cross-functional coordination
Multi-timescale learning architecture for heterogeneous decisions
Model-free multi-agent design enhancing scalability and flexibility
🔎 Similar Papers
No similar papers found.
Jinyang Jiang
Jinyang Jiang
Peking University
Artificial Intelligence
J
Jinhui Han
Guanghua School of Management, Peking University, Beijing 100871, CHINA
Yijie Peng
Yijie Peng
Peking University
SimulationBayesian LearningArtificial IntelligenceHealthcareFinancial Engineering
Y
Ying Zhang
Guanghua School of Management, Peking University, Beijing 100871, CHINA