🤖 AI Summary
Modern traffic diffusion models generate realistic trajectories but frequently violate physical constraints—such as collisions, boundary crossings, and wall penetration—achieving only 50.3% physical validity and a 24.6% collision rate on the Waymo dataset. To address this, we propose a “validity-first” spatial intelligence framework that explicitly encodes hard physical constraints—including collision avoidance and kinematic consistency—as mandatory energy-based guidance terms directly within the diffusion sampling process, eliminating reliance on implicit learning or post-hoc correction and requiring no model retraining. Our method dynamically steers denoising directions via an energy-based formulation, ensuring real-time compliance during inference. On Waymo, our approach reduces the collision rate to 8.1% and improves overall physical validity to 94.2%, while simultaneously enhancing trajectory fidelity—reducing average displacement error (ADE) from 1.34 m to 1.21 m—demonstrating unprecedented balance between physical plausibility and generation quality.
📝 Abstract
Modern diffusion models generate realistic traffic simulations but systematically violate physical constraints. In a large-scale evaluation of SceneDiffuser++, a state-of-the-art traffic simulator, we find that 50% of generated trajectories violate basic physical laws - vehicles collide, drive off roads, and spawn inside buildings. This reveals a fundamental limitation: current models treat physical validity as an emergent property rather than an architectural requirement. We propose Validity-First Spatial Intelligence (VFSI), which enforces constraints through energy-based guidance during diffusion sampling, without model retraining. By incorporating collision avoidance and kinematic constraints as energy functions, we guide the denoising process toward physically valid trajectories. Across 200 urban scenarios from the Waymo Open Motion Dataset, VFSI reduces collision rates by 67% (24.6% to 8.1%) and improves overall validity by 87% (50.3% to 94.2%), while simultaneously improving realism metrics (ADE: 1.34m to 1.21m). Our model-agnostic approach demonstrates that explicit constraint enforcement during inference is both necessary and sufficient for physically valid traffic simulation.