Enhancing Large Vision Model in Street Scene Semantic Understanding through Leveraging Posterior Optimization Trajectory

📅 2025-01-03

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

To address model underfitting and slow convergence of downstream perception heads caused by rapidly growing data scales in autonomous driving street-scene semantic understanding, this paper proposes a novel training paradigm guided by Large Vision Models (LVMs) and Posterior Optimization Trajectories (POT). We introduce the first POT generator, which explicitly models and predicts parameter optimization paths to provide proactive guidance (POTGui) for lightweight perception head training. Our method synergistically integrates the strong generalization capability of LVMs with an efficient fine-tuning design, achieving significant performance gains while maintaining on-vehicle deployment feasibility. Experiments demonstrate that our approach achieves stable convergence within 10 epochs—over six times faster than state-of-the-art methods—and yields an average performance improvement of 66.48%.

Technology Category

Application Category

📝 Abstract

To improve the generalization of the autonomous driving (AD) perception model, vehicles need to update the model over time based on the continuously collected data. As time progresses, the amount of data fitted by the AD model expands, which helps to improve the AD model generalization substantially. However, such ever-expanding data is a double-edged sword for the AD model. Specifically, as the fitted data volume grows to exceed the the AD model's fitting capacities, the AD model is prone to under-fitting. To address this issue, we propose to use a pretrained Large Vision Models (LVMs) as backbone coupled with downstream perception head to understand AD semantic information. This design can not only surmount the aforementioned under-fitting problem due to LVMs' powerful fitting capabilities, but also enhance the perception generalization thanks to LVMs' vast and diverse training data. On the other hand, to mitigate vehicles' computational burden of training the perception head while running LVM backbone, we introduce a Posterior Optimization Trajectory (POT)-Guided optimization scheme (POTGui) to accelerate the convergence. Concretely, we propose a POT Generator (POTGen) to generate posterior (future) optimization direction in advance to guide the current optimization iteration, through which the model can generally converge within 10 epochs. Extensive experiments demonstrate that the proposed method improves the performance by over 66.48% and converges faster over 6 times, compared to the existing state-of-the-art approach.

Problem

Research questions and friction points this paper is trying to address.

Visual Models

Street Scene Understanding

Autonomous Vehicles

Innovation

Methods, ideas, or system contributions that make the work stand out.

Posterior Optimized Trajectories

Large-scale Visual Models

Accelerated Learning Efficiency for Autonomous Vehicles

🔎 Similar Papers

Rethinking Semantic Parsing for Large Language Models: Enhancing LLM Performance with Semantic Hints