DecisionLLM: Large Language Models for Long Sequence Decision Exploration

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional reinforcement learning in handling long-horizon sequential decision-making within complex dynamic environments—such as real-time bidding in computational advertising—and the difficulty of large language models (LLMs) in modeling continuous numerical values. To overcome these challenges, the authors propose DecisionLLM, a novel framework that treats trajectory data as an independent modality and aligns it with natural language task descriptions, enabling LLMs to autoregressively predict future decisions. This approach marks the first effective application of LLMs to offline long-sequence decision tasks. Experimental results demonstrate that DecisionLLM-3B outperforms Decision Transformer by 69.4 on Maze2D umaze-v1 and by 0.085 on the AuctionNet benchmark, while also revealing clear scaling laws with respect to model size, data volume, and data quality.

Technology Category

Application Category

📝 Abstract
Long-sequence decision-making, which is usually addressed through reinforcement learning (RL), is a critical component for optimizing strategic operations in dynamic environments, such as real-time bidding in computational advertising. The Decision Transformer (DT) introduced a powerful paradigm by framing RL as an autoregressive sequence modeling problem. Concurrently, Large Language Models (LLMs) have demonstrated remarkable success in complex reasoning and planning tasks. This inspires us whether LLMs, which share the same Transformer foundation, but operate at a much larger scale, can unlock new levels of performance in long-horizon sequential decision-making problem. This work investigates the application of LLMs to offline decision making tasks. A fundamental challenge in this domain is the LLMs'inherent inability to interpret continuous values, as they lack a native understanding of numerical magnitude and order when values are represented as text strings. To address this, we propose treating trajectories as a distinct modality. By learning to align trajectory data with natural language task descriptions, our model can autoregressively predict future decisions within a cohesive framework we term DecisionLLM. We establish a set of scaling laws governing this paradigm, demonstrating that performance hinges on three factors: model scale, data volume, and data quality. In offline experimental benchmarks and bidding scenarios, DecisionLLM achieves strong performance. Specifically, DecisionLLM-3B outperforms the traditional Decision Transformer (DT) by 69.4 on Maze2D umaze-v1 and by 0.085 on AuctionNet. It extends the AIGB paradigm and points to promising directions for future exploration in online bidding.
Problem

Research questions and friction points this paper is trying to address.

Long-sequence decision-making
Large Language Models
Offline decision making
Continuous values
Numerical understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

DecisionLLM
long-horizon decision-making
trajectory-language alignment
offline reinforcement learning
scaling laws
🔎 Similar Papers
No similar papers found.
X
Xiaowei Lv
Renmin University of China, Beijing China
Z
Zhilin Zhang
Alibaba Group, Beijing, China
Yijun Li
Yijun Li
Adobe Research
Computer Vision
Y
Yusen Huo
Alibaba Group, Beijing, China
S
Siyuan Ju
Alibaba Group, Beijing, China
X
Xuyan Li
Alibaba Group, Beijing, China
C
Chunxiang Hong
Alibaba Group, Beijing, China
T
Tianyu Wang
Alibaba Group, Beijing, China
Y
Yongcai Wang
Renmin University of China, Beijing China
P
Peng Sun
Alibaba Group, Beijing, China
C
Chuan Yu
Alibaba Group, Beijing, China
Jian Xu
Jian Xu
Senior Director, Ad Platform, Alibaba Group
Computational AdvertisingMachine LearningData MiningData Privacy
Bo Zheng
Bo Zheng
Researcher, Alibaba Group
AINetworkE-Commerce