DONUT: A Decoder-Only Model for Trajectory Prediction

📅 2025-06-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses multi-agent motion prediction for autonomous driving and proposes DONUT—the first decoder-only architecture that unifies historical state encoding and future trajectory generation, departing from conventional encoder-decoder paradigms. Methodologically, it employs a decoder-only Transformer with an autoregressive iterative prediction mechanism to preserve temporal fidelity; introduces a “look-ahead prediction” strategy to model long-horizon motion trends; and integrates multi-step joint prediction (overprediction) with relative-coordinate positional encoding. Evaluated on the Argoverse 2 single-agent benchmark, DONUT achieves new state-of-the-art performance, significantly outperforming leading encoder-decoder models. These results empirically validate the effectiveness and superiority of the decoder-only paradigm for trajectory prediction, offering a novel, streamlined architectural alternative for motion forecasting in autonomous systems.

Technology Category

Application Category

📝 Abstract
Predicting the motion of other agents in a scene is highly relevant for autonomous driving, as it allows a self-driving car to anticipate. Inspired by the success of decoder-only models for language modeling, we propose DONUT, a Decoder-Only Network for Unrolling Trajectories. Different from existing encoder-decoder forecasting models, we encode historical trajectories and predict future trajectories with a single autoregressive model. This allows the model to make iterative predictions in a consistent manner, and ensures that the model is always provided with up-to-date information, enhancing the performance. Furthermore, inspired by multi-token prediction for language modeling, we introduce an 'overprediction' strategy that gives the network the auxiliary task of predicting trajectories at longer temporal horizons. This allows the model to better anticipate the future, and further improves the performance. With experiments, we demonstrate that our decoder-only approach outperforms the encoder-decoder baseline, and achieves new state-of-the-art results on the Argoverse 2 single-agent motion forecasting benchmark.
Problem

Research questions and friction points this paper is trying to address.

Predicting agent motion for autonomous driving safety
Replacing encoder-decoder models with decoder-only trajectory prediction
Improving forecasting via multi-horizon overprediction strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoder-only model for trajectory prediction
Single autoregressive model for encoding and predicting
Overprediction strategy for longer horizon forecasting
🔎 Similar Papers
No similar papers found.