MoFlow: One-Step Flow Matching for Human Trajectory Forecasting via Implicit Maximum Likelihood Estimation based Distillation

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This paper addresses the multimodal modeling challenge in human trajectory prediction by proposing an efficient and robust one-stage generative approach. Methodologically, it introduces: (1) a novel flow-matching loss that jointly optimizes single-sample accuracy and multi-sample diversity; (2) the first implicit maximum likelihood estimation (IMLE)-based knowledge distillation framework for flow models, requiring only teacher sampling—no explicit density evaluation; and (3) a conditional flow-matching architecture enabling joint modeling of historical trajectories and scene context, generating K physically plausible and socially compliant future trajectories in a single forward pass. Evaluated on SportVU, ETH-UCY, and SDD benchmarks, the method achieves state-of-the-art performance. Notably, the distilled student model attains a 100× speedup in inference latency over the teacher, without compromising prediction quality.

Technology Category

Application Category

📝 Abstract

In this paper, we address the problem of human trajectory forecasting, which aims to predict the inherently multi-modal future movements of humans based on their past trajectories and other contextual cues. We propose a novel motion prediction conditional flow matching model, termed MoFlow, to predict K-shot future trajectories for all agents in a given scene. We design a novel flow matching loss function that not only ensures at least one of the $K$ sets of future trajectories is accurate but also encourages all $K$ sets of future trajectories to be diverse and plausible. Furthermore, by leveraging the implicit maximum likelihood estimation (IMLE), we propose a novel distillation method for flow models that only requires samples from the teacher model. Extensive experiments on the real-world datasets, including SportVU NBA games, ETH-UCY, and SDD, demonstrate that both our teacher flow model and the IMLE-distilled student model achieve state-of-the-art performance. These models can generate diverse trajectories that are physically and socially plausible. Moreover, our one-step student model is $ extbf{100}$ times faster than the teacher flow model during sampling. The code, model, and data are available at our project page: https://moflow-imle.github.io

Problem

Research questions and friction points this paper is trying to address.

Predict multi-modal human future movements using past trajectories.

Develop a flow matching model for accurate and diverse trajectory predictions.

Enhance model efficiency with a fast, IMLE-based distillation method.

Innovation

Methods, ideas, or system contributions that make the work stand out.

MoFlow: motion prediction via flow matching

IMLE-based distillation for efficient sampling

Ensures diverse, plausible K-shot trajectory predictions

🔎 Similar Papers

No similar papers found.