Maximum In-Support Return Modeling for Dynamic Recommendation with Language Model Prior

📅 2025-10-09

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Reinforcement learning–based recommender systems (RLRS) struggle to model complex user–item interactions under sparse user feedback and suboptimal historical data. Method: This paper proposes MDT4Rec, a novel framework that (1) relocates trajectory concatenation to the action inference stage, enabling context-aware dynamic trajectory pruning, and (2) initializes the Decision Transformer with a pretrained large language model, replacing linear embeddings with MLPs and adopting LoRA for parameter-efficient fine-tuning. Contribution/Results: Evaluated on five public benchmark datasets and an online simulation environment, MDT4Rec consistently outperforms state-of-the-art baselines. It demonstrates superior effectiveness and robustness in sequential recommendation under sparse feedback regimes, validating its capacity to leverage limited interaction signals for improved decision-making.

Technology Category

Application Category

📝 Abstract

Reinforcement Learning-based recommender systems (RLRS) offer an effective way to handle sequential recommendation tasks but often face difficulties in real-world settings, where user feedback data can be sub-optimal or sparse. In this paper, we introduce MDT4Rec, an offline RLRS framework that builds on the Decision Transformer (DT) to address two major challenges: learning from sub-optimal histories and representing complex user-item interactions. First, MDT4Rec shifts the trajectory stitching procedure from the training phase to action inference, allowing the system to shorten its historical context when necessary and thereby ignore negative or unsuccessful past experiences. Second, MDT4Rec initializes DT with a pre-trained large language model (LLM) for knowledge transfer, replaces linear embedding layers with Multi-Layer Perceptrons (MLPs) for more flexible representations, and employs Low-Rank Adaptation (LoRA) to efficiently fine-tune only a small subset of parameters. We evaluate MDT4Rec on five public datasets and in an online simulation environment, demonstrating that it outperforms existing methods.

Problem

Research questions and friction points this paper is trying to address.

Learning from sub-optimal user feedback histories

Representing complex user-item interaction patterns

Improving offline reinforcement learning for recommendation systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Shifts trajectory stitching to action inference

Initializes Decision Transformer with pre-trained LLM

Uses LoRA for efficient fine-tuning of parameters

🔎 Similar Papers

No similar papers found.