🤖 AI Summary
This work addresses the high computational cost of multimodal perception in autonomous driving and its limited adaptability to dynamic environmental changes. The authors propose PRAM-R, a framework featuring an asynchronous dual-loop architecture that jointly coordinates perception, reasoning, action, and memory. Central to this approach is a large language model (LLM)-driven, context-aware modality routing mechanism coupled with a hierarchical memory system, enabling adaptive modality selection and long-term consistency preservation. Evaluated on the nuScenes dataset, PRAM-R reduces modality usage by 6.22% while improving memory recall by 20%, all without compromising trajectory prediction accuracy. In synthetic stress tests, the framework demonstrates an 87.2% reduction in routing oscillations, substantially enhancing both computational efficiency and system robustness.
📝 Abstract
Multimodal perception enables robust autonomous driving but incurs unnecessary computational cost when all sensors remain active. This paper presents PRAM-R, a unified Perception-Reasoning-Action-Memory framework with LLM-Guided Modality Routing for adaptive autonomous driving. PRAM-R adopts an asynchronous dual-loop design: a fast reactive loop for perception and control, and a slow deliberative loop for reasoning-driven modality selection and memory updates. An LLM router selects and weights modalities using environmental context and sensor diagnostics, while a hierarchical memory module preserves temporal consistency and supports long-term adaptation. We conduct a two-stage evaluation: (1) synthetic stress tests for stability analysis and (2) real-world validation on the nuScenes dataset. Synthetic stress tests confirm 87.2% reduction in routing oscillations via hysteresis-based stabilization. Real-world validation on nuScenes shows 6.22% modality reduction with 20% memory recall while maintaining comparable trajectory accuracy to full-modality baselines in complex urban scenarios. Our work demonstrates that LLM-augmented architectures with hierarchical memory achieve efficient, adaptive multimodal perception in autonomous driving.