The Case for Negative Data: From Crash Reports to Counterfactuals for Reasonable Driving

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Existing autonomous driving models trained solely on accident-free data exhibit poor discriminative capability near safety boundaries. This work addresses the problem by leveraging real-world crash reports—structured as negative examples—to build a counterfactual reasoning–based decision system. First, unstructured third-person accident narratives are normalized into egocentric, unified scene-action representations. Second, an agent-centric counterfactual reasoning mechanism is introduced, integrating joint embedding of vehicle logs and accident data with retrieval-augmented precedent matching to proactively avoid high-risk scenarios. On nuScenes, precedent retrieval alone increases preferred action recall from 24% to 53%; incorporating counterfactual reasoning further significantly improves decision accuracy in high-risk regions. To our knowledge, this is the first systematic effort to transform real-world accident negatives into learnable counterfactual supervision signals, thereby enhancing model robustness for safety-critical decision-making.

Technology Category

Application Category

📝 Abstract

Learning-based autonomous driving systems are trained mostly on incident-free data, offering little guidance near safety-performance boundaries. Real crash reports contain precisely the contrastive evidence needed, but they are hard to use: narratives are unstructured, third-person, and poorly grounded to sensor views. We address these challenges by normalizing crash narratives to ego-centric language and converting both logs and crashes into a unified scene-action representation suitable for retrieval. At decision time, our system adjudicates proposed actions by retrieving relevant precedents from this unified index; an agentic counterfactual extension proposes plausible alternatives, retrieves for each, and reasons across outcomes before deciding. On a nuScenes benchmark, precedent retrieval substantially improves calibration, with recall on contextually preferred actions rising from 24% to 53%. The counterfactual variant preserves these gains while sharpening decisions near risk.

Problem

Research questions and friction points this paper is trying to address.

Normalizing unstructured crash reports into structured ego-centric representations

Creating unified scene-action representation for retrieving safety precedents

Improving decision calibration and risk assessment in autonomous driving

Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing crash narratives to ego-centric language

Converting logs and crashes into unified scene-action representation

Retrieving precedents and reasoning across counterfactual outcomes

🔎 Similar Papers

No similar papers found.