🤖 AI Summary
Current autonomous driving systems rely on multimodal perception but lack the ability to sense road-induced tactile stimuli, limiting their dynamic control performance. Inspired by human synesthesia, this work proposes a "Synesthesia of Vehicles" (SoV) framework—the first to introduce synesthetic mechanisms into autonomous driving—enabling unsupervised synthesis of high-quality tactile signals from purely visual inputs. This is achieved through a cross-modal spatiotemporal alignment strategy and a latent diffusion-based Visual-Tactile Synesthesia generation model (VTSyn). Experiments on a real-world vehicle multimodal dataset demonstrate that VTSyn outperforms existing methods in both time and frequency domains and excels in downstream classification tasks, significantly enhancing the system’s active tactile perception and safety.
📝 Abstract
Autonomous vehicles (AVs) rely on multi-modal fusion for safety, but current visual and optical sensors fail to detect road-induced excitations which are critical for vehicles'dynamic control. Inspired by human synesthesia, we propose the Synesthesia of Vehicles (SoV), a novel framework to predict tactile excitations from visual inputs for autonomous vehicles. We develop a cross-modal spatiotemporal alignment method to address temporal and spatial disparities. Furthermore, a visual-tactile synesthetic (VTSyn) generative model using latent diffusion is proposed for unsupervised high-quality tactile data synthesis. A real-vehicle perception system collected a multi-modal dataset across diverse road and lighting conditions. Extensive experiments show that VTSyn outperforms existing models in temporal, frequency, and classification performance, enhancing AV safety through proactive tactile perception.