MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

πŸ“… 2025-03-13
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses the multimodal modeling challenge in human trajectory prediction by proposing an efficient and robust one-stage generative approach. Methodologically, it introduces: (1) a novel flow-matching loss that jointly optimizes single-sample accuracy and multi-sample diversity; (2) the first implicit maximum likelihood estimation (IMLE)-based knowledge distillation framework for flow models, requiring only teacher samplingβ€”no explicit density evaluation; and (3) a conditional flow-matching architecture enabling joint modeling of historical trajectories and scene context, generating K physically plausible and socially compliant future trajectories in a single forward pass. Evaluated on SportVU, ETH-UCY, and SDD benchmarks, the method achieves state-of-the-art performance. Notably, the distilled student model attains a 100Γ— speedup in inference latency over the teacher, without compromising prediction quality.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we address the problem of human trajectory forecasting, which aims to predict the inherently multi-modal future movements of humans based on their past trajectories and other contextual cues. We propose a novel motion prediction conditional flow matching model, termed MoFlow, to predict K-shot future trajectories for all agents in a given scene. We design a novel flow matching loss function that not only ensures at least one of the $K$ sets of future trajectories is accurate but also encourages all $K$ sets of future trajectories to be diverse and plausible. Furthermore, by leveraging the implicit maximum likelihood estimation (IMLE), we propose a novel distillation method for flow models that only requires samples from the teacher model. Extensive experiments on the real-world datasets, including SportVU NBA games, ETH-UCY, and SDD, demonstrate that both our teacher flow model and the IMLE-distilled student model achieve state-of-the-art performance. These models can generate diverse trajectories that are physically and socially plausible. Moreover, our one-step student model is $ extbf{100}$ times faster than the teacher flow model during sampling. The code, model, and data are available at our project page: https://moflow-imle.github.io
Problem

Research questions and friction points this paper is trying to address.

Predict multi-modal human future movements using past trajectories.
Develop a flow matching model for accurate and diverse trajectory predictions.
Enhance model efficiency with a fast, IMLE-based distillation method.
Innovation

Methods, ideas, or system contributions that make the work stand out.

MoFlow: motion prediction via flow matching
IMLE-based distillation for efficient sampling
Ensures diverse, plausible K-shot trajectory predictions
πŸ”Ž Similar Papers
No similar papers found.