Improving After-sales Service: Deep Reinforcement Learning for Dynamic Time Slot Assignment with Commitments and Customer Preferences

📅 2025-09-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the joint optimization of dynamic time-window assignment and service engineer routing in OEM after-sales service for high-tech equipment, considering customer preferences, service-level agreements, and real-time responsiveness. We propose DTSAP-CCP, a hierarchical sequential decision-making model. Methodologically, we integrate an attention-enhanced deep reinforcement learning (DRL) framework with a scenario-sampling-based planning architecture, incorporating a Rollout execution structure and training a neuro-heuristic solver to accelerate combinatorial path planning under uncertainty. Compared to conventional rule-based heuristics and standard Rollout baselines, our approach achieves significant improvements in both solution quality and computational efficiency. Empirical validation on large-scale medical equipment after-sales service operations demonstrates its effectiveness, robustness, and scalability in complex, real-world settings.

Technology Category

Application Category

📝 Abstract
Problem definition: For original equipment manufacturers (OEMs), high-tech maintenance is a strategic component in after-sales services, involving close coordination between customers and service engineers. Each customer suggests several time slots for their maintenance task, from which the OEM must select one. This decision needs to be made promptly to support customers' planning. At the end of each day, routes for service engineers are planned to fulfill the tasks scheduled for the following day. We study this hierarchical and sequential decision-making problem-the Dynamic Time Slot Assignment Problem with Commitments and Customer Preferences (DTSAP-CCP)-in this paper. Methodology/results: Two distinct approaches are proposed: 1) an attention-based deep reinforcement learning with rollout execution (ADRL-RE) and 2) a scenario-based planning approach (SBP). The ADRL-RE combines a well-trained attention-based neural network with a rollout framework for online trajectory simulation. To support the training, we develop a neural heuristic solver that provides rapid route planning solutions, enabling efficient learning in complex combinatorial settings. The SBP approach samples several scenarios to guide the time slot assignment. Numerical experiments demonstrate the superiority of ADRL-RE and the stability of SBP compared to both rule-based and rollout-based approaches. Furthermore, the strong practicality of ADRL-RE is verified in a case study of after-sales service for large medical equipment. Implications: This study provides OEMs with practical decision-support tools for dynamic maintenance scheduling, balancing customer preferences and operational efficiency. In particular, our ADRL-RE shows strong real-world potential, supporting timely and customer-aligned maintenance scheduling.
Problem

Research questions and friction points this paper is trying to address.

Selecting optimal maintenance time slots from customer preferences for OEMs
Planning service engineer routes efficiently for scheduled maintenance tasks
Balancing customer preferences with operational efficiency in dynamic scheduling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attention-based deep reinforcement learning with rollout execution
Neural heuristic solver for rapid route planning solutions
Scenario-based planning approach sampling multiple scenarios
🔎 Similar Papers
No similar papers found.
X
Xiao Mao
School of Automation, Central South University, School of Industrial Engineering, Eindhoven University of Technology, Shenzhen Branch of China United Network Communications Co., Ltd.
A
Albert H. Schrotenboer
School of Industrial Engineering, Eindhoven University of Technology
G
Guohua Wu
School of Automation, Central South University
Willem van Jaarsveld
Willem van Jaarsveld
Associate Professor of Operations Research, Eindhoven University of Technology
Stochastic operations managementSupply Chain ManagementDeep Reinforcement Learning