CLOVER: Closed-Loop Value Estimation \& Ranking for End-to-End Autonomous Driving Planning

📅 2026-05-14

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the mismatch between training—typically based on imitating a single expert trajectory—and evaluation—relying on multidimensional rule-based metrics—in end-to-end autonomous driving. To bridge this gap, the authors propose a lightweight generator-scorer collaborative framework: the generator produces diverse trajectories, while the scorer predicts fine-grained planning sub-metrics to guide inference-time ranking. Pseudo-expert trajectories are constructed via scorer-based filtering, enabling set-level coverage supervision and conservative closed-loop self-distillation—the first approach to reliably optimize the generator under scorer guidance. Integrating vectorized Pareto objectives with stability regularization, the method achieves state-of-the-art performance with 94.5 PDMS and 90.4 EPDMS on NAVSIM, 48.3 EPDMS on NavHard, and the lowest L2 error and collision rate in nuScenes open-loop evaluation.

📝 Abstract

End-to-end autonomous driving planners are commonly trained by imitating a single logged trajectory, yet evaluated by rule-based planning metrics that measure safety, feasibility, progress, and comfort. This creates a training--evaluation mismatch: trajectories close to the logged path may violate planning rules, while alternatives farther from the demonstration can remain valid and high-scoring. The mismatch is especially limiting for proposal-selection planners, whose performance depends on candidate-set coverage and scorer ranking quality. We propose CLOVER, a Closed-LOop Value Estimation and Ranking framework for end-to-end autonomous driving planning. CLOVER follows a lightweight generator--scorer formulation: a generator produces diverse candidate trajectories, and a scorer predicts planning-metric sub-scores to rank them at inference time. To expand proposal support beyond single-trajectory imitation, CLOVER constructs evaluator-filtered pseudo-expert trajectories and trains the generator with set-level coverage supervision. It then performs conservative closed-loop self-distillation: the scorer is fitted to true evaluator sub-scores on generated proposals, while the generator is refined toward teacher-selected top-$k$ and vector-Pareto targets with stability regularization. We analyze when an imperfect scorer can improve the generator, showing that scorer-mediated refinement is reliable when scorer-selected targets are enriched under the true evaluator and updates remain conservative. On NAVSIM, CLOVER achieves 94.5 PDMS and 90.4 EPDMS, establishing a new state of the art. On the more challenging NavHard split, it obtains 48.3 EPDMS, matching the strongest reported result. On supplementary nuScenes open-loop evaluation, CLOVER achieves the lowest L2 error and collision rate among compared methods. Code data will be released at https://github.com/WilliamXuanYu/CLOVER.

Problem

Research questions and friction points this paper is trying to address.

end-to-end autonomous driving

training-evaluation mismatch

trajectory planning

proposal-selection planners

planning metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop self-distillation

proposal-selection planning

value estimation and ranking