🤖 AI Summary
Current multimodal large language models (MLLMs) exhibit inaccurate localization and poor noise robustness in structured perception tasks—such as object detection—for autonomous driving. To address this, we propose a curriculum-guided reinforcement learning framework. Our method couples curriculum learning with Group Relative Policy Optimization (GRPO), introducing a difficulty-aware data scheduling mechanism that enables progressive training on complex samples under KL-divergence regularization. Additionally, we design a difficulty-aware filtering technique for sparse reward settings to enhance training stability. Evaluated on autonomous-driving detection benchmarks, our approach achieves significant improvements in both detection accuracy and robustness against input perturbations. Ablation studies confirm the critical roles of reward function design and curriculum pacing in ensuring stable and efficient convergence.
📝 Abstract
Multimodal Large Language Models (MLLMs) excel in vision-language reasoning but often struggle with structured perception tasks requiring precise localization and robustness. We propose a reinforcement learning framework that augments Group Relative Policy Optimization (GRPO) with curriculum-based data scheduling and difficulty-aware filtering. This approach stabilizes optimization under sparse, noisy rewards and enables progressive adaptation to complex samples. Evaluations on autonomous driving benchmarks demonstrate substantial improvements in detection accuracy and robustness. Ablation studies confirm the importance of reward design, KL regularization, and curriculum pacing for convergence stability and generalization. Our findings highlight reinforcement-driven optimization with structured data curricula as a scalable path toward robust and interpretable multimodal detection.