Generative Sequential Notification Optimization via Multi-Objective Decision Transformers

📅 2025-09-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses sequential decision optimization in notification push systems, aiming to balance message utility and user fatigue. We propose a multi-objective offline reinforcement learning framework based on the Decision Transformer architecture. First, we employ quantile regression to model return-to-go, enhancing robustness in long-term reward estimation. Second, we design a non-episodic multi-reward mechanism that explicitly decouples utility and fatigue signals. Third, we develop a ring-buffer-based sequence processing system enabling near-real-time inference and interpretable analysis. By reformulating policy learning as conditional supervised learning, our approach achieves efficient multi-objective optimization in high-dimensional recommendation settings. Online A/B tests conducted in LinkedIn’s production environment demonstrate that our method improves session count by 0.72% over a baseline multi-objective Conservative Q-Learning (CQL) approach, while significantly enhancing notification relevance, user engagement, and long-term activity—without exacerbating user fatigue.

Technology Category

Application Category

📝 Abstract
Notifications are an important communication channel for delivering timely and relevant information. Optimizing their delivery involves addressing complex sequential decision-making challenges under constraints such as message utility and user fatigue. Offline reinforcement learning (RL) methods, such as Conservative Q-Learning (CQL), have been applied to this problem but face practical challenges at scale, including instability, sensitivity to distribution shifts, limited reproducibility, and difficulties with explainability in high-dimensional recommendation settings. We present a Decision Transformer (DT) based framework that reframes policy learning as return-conditioned supervised learning, improving robustness, scalability, and modeling flexibility. Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, a quantile regression approach to return-to-go conditioning, and a production-ready system with circular buffer-based sequence processing for near-real-time inference. Extensive offline and online experiments in a deployed notification system show that our approach improves notification utility and overall session activity while minimizing user fatigue. Compared to a multi-objective CQL-based agent, the DT-based approach achieved a +0.72% increase in sessions for notification decision-making at LinkedIn by making notification recommendation more relevant.
Problem

Research questions and friction points this paper is trying to address.

Optimizing notification delivery under utility and fatigue constraints
Addressing instability and sensitivity in offline reinforcement learning methods
Improving robustness and scalability for real-world notification systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decision Transformer framework for policy learning
Multi-reward design for non-episodic notification tasks
Quantile regression approach for return conditioning
🔎 Similar Papers
B
Borja Ocejo
LinkedIn, Mountain View, USA
R
Ruofan Wang
LinkedIn, Mountain View, USA
K
Ke Liu
LinkedIn, Mountain View, USA; currently at Pinterest
R
Rohit K. Patra
LinkedIn, Mountain View, USA
Haotian Shen
Haotian Shen
Hybrid Systems Lab, UC Berkeley
control theory
D
David Liu
LinkedIn, Mountain View, USA
Y
Yiwen Yuan
LinkedIn, Mountain View, USA
G
Gokulraj Mohanasundaram
LinkedIn, Mountain View, USA
Fedor Borisyuk
Fedor Borisyuk
LinkedIn
Machine learning
P
Prakruthi Prabhakar
LinkedIn, Mountain View, USA