Generative Sequential Notification Optimization via Multi-Objective Decision Transformers

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This study addresses sequential decision optimization in notification push systems, aiming to balance message utility and user fatigue. We propose a multi-objective offline reinforcement learning framework based on the Decision Transformer architecture. First, we employ quantile regression to model return-to-go, enhancing robustness in long-term reward estimation. Second, we design a non-episodic multi-reward mechanism that explicitly decouples utility and fatigue signals. Third, we develop a ring-buffer-based sequence processing system enabling near-real-time inference and interpretable analysis. By reformulating policy learning as conditional supervised learning, our approach achieves efficient multi-objective optimization in high-dimensional recommendation settings. Online A/B tests conducted in LinkedIn’s production environment demonstrate that our method improves session count by 0.72% over a baseline multi-objective Conservative Q-Learning (CQL) approach, while significantly enhancing notification relevance, user engagement, and long-term activity—without exacerbating user fatigue.

Technology Category

Application Category

📝 Abstract

Notifications are an important communication channel for delivering timely and relevant information. Optimizing their delivery involves addressing complex sequential decision-making challenges under constraints such as message utility and user fatigue. Offline reinforcement learning (RL) methods, such as Conservative Q-Learning (CQL), have been applied to this problem but face practical challenges at scale, including instability, sensitivity to distribution shifts, limited reproducibility, and difficulties with explainability in high-dimensional recommendation settings. We present a Decision Transformer (DT) based framework that reframes policy learning as return-conditioned supervised learning, improving robustness, scalability, and modeling flexibility. Our contributions include a real-world comparison with CQL, a multi-reward design suitable for non-episodic tasks, a quantile regression approach to return-to-go conditioning, and a production-ready system with circular buffer-based sequence processing for near-real-time inference. Extensive offline and online experiments in a deployed notification system show that our approach improves notification utility and overall session activity while minimizing user fatigue. Compared to a multi-objective CQL-based agent, the DT-based approach achieved a +0.72% increase in sessions for notification decision-making at LinkedIn by making notification recommendation more relevant.

Problem

Research questions and friction points this paper is trying to address.

Optimizing notification delivery under utility and fatigue constraints

Addressing instability and sensitivity in offline reinforcement learning methods

Improving robustness and scalability for real-world notification systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decision Transformer framework for policy learning

Multi-reward design for non-episodic notification tasks

Quantile regression approach for return conditioning

🔎 Similar Papers

Deep Pareto Reinforcement Learning for Multi-Objective Recommender Systems