🤖 AI Summary
Multimodal human trajectory prediction (HTP) models lack ground-truth labels, making rigorous validation challenging in oracle-free settings.
Method: We propose the first mutation testing framework tailored for oracle-free HTP evaluation. It introduces five input transformations—geometric (mirroring, rotation, scaling) and semantic (label perturbation, obstacle injection)—and defines output distribution consistency using Wasserstein and Hellinger distances.
Contribution/Results: By adapting mutation testing to stochastic, multimodal trajectory predictors, our framework enables systematic assessment of model robustness and behavioral consistency under input perturbations and environmental changes. Experiments demonstrate its effectiveness in detecting anomalous model responses, providing a scalable, reproducible, and automated validation tool for HTP models in label-scarce scenarios.
📝 Abstract
Context: Predicting human trajectories is crucial for the safety and reliability of autonomous systems, such as automated vehicles and mobile robots. However, rigorously testing the underlying multimodal Human Trajectory Prediction (HTP) models, which typically use multiple input sources (e.g., trajectory history and environment maps) and produce stochastic outputs (multiple possible future paths), presents significant challenges. The primary difficulty lies in the absence of a definitive test oracle, as numerous future trajectories might be plausible for any given scenario. Objectives: This research presents the application of Metamorphic Testing (MT) as a systematic methodology for testing multimodal HTP systems. We address the oracle problem through metamorphic relations (MRs) adapted for the complexities and stochastic nature of HTP. Methods: We present five MRs, targeting transformations of both historical trajectory data and semantic segmentation maps used as an environmental context. These MRs encompass: 1) label-preserving geometric transformations (mirroring, rotation, rescaling) applied to both trajectory and map inputs, where outputs are expected to transform correspondingly. 2) Map-altering transformations (changing semantic class labels, introducing obstacles) with predictable changes in trajectory distributions. We propose probabilistic violation criteria based on distance metrics between probability distributions, such as the Wasserstein or Hellinger distance. Conclusion: This study introduces tool, a MT framework for the oracle-less testing of multimodal, stochastic HTP systems. It allows for assessment of model robustness against input transformations and contextual changes without reliance on ground-truth trajectories.