Photorealistic Phantom Roads in Real Scenes: Disentangling 3D Hallucinations from Physical Geometry

📅 2025-12-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Monocular depth foundation models, leveraging strong semantic priors, often hallucinate spurious 3D structures—termed “3D Mirage”—in geometrically planar yet perceptually ambiguous regions (e.g., street paintings), posing an unquantified safety risk. This work is the first to systematically expose, quantify, and mitigate this phenomenon. We introduce the first real-world 3D Mirage hallucination benchmark; propose a dual-metric evaluation framework—DCS (Distortion-based Curvature Score) measuring hallucinated non-planarity via Laplacian analysis, and CCS (Contextual Consistency Score) quantifying contextual instability; and design Grounded Self-Distillation, a frozen-teacher/tunable-student framework incorporating plane-aware self-distillation to suppress hallucinations while preserving semantic knowledge. Experiments demonstrate a 42% reduction in DCS and a 38% reduction in CCS, advancing monocular depth estimation from semantics-driven to structure-robust evaluation paradigms.

Technology Category

Application Category

📝 Abstract
Monocular depth foundation models achieve remarkable generalization by learning large-scale semantic priors, but this creates a critical vulnerability: they hallucinate illusory 3D structures from geometrically planar but perceptually ambiguous inputs. We term this failure the 3D Mirage. This paper introduces the first end-to-end framework to probe, quantify, and tame this unquantified safety risk. To probe, we present 3D-Mirage, the first benchmark of real-world illusions (e.g., street art) with precise planar-region annotations and context-restricted crops. To quantify, we propose a Laplacian-based evaluation framework with two metrics: the Deviation Composite Score (DCS) for spurious non-planarity and the Confusion Composite Score (CCS) for contextual instability. To tame this failure, we introduce Grounded Self-Distillation, a parameter-efficient strategy that surgically enforces planarity on illusion ROIs while using a frozen teacher to preserve background knowledge, thus avoiding catastrophic forgetting. Our work provides the essential tools to diagnose and mitigate this phenomenon, urging a necessary shift in MDE evaluation from pixel-wise accuracy to structural and contextual robustness. Our code and benchmark will be publicly available to foster this exciting research direction.
Problem

Research questions and friction points this paper is trying to address.

Monocular depth models hallucinate 3D structures from planar inputs
The paper introduces a benchmark to probe and quantify this hallucination risk
It proposes a method to enforce planarity while preserving background knowledge
Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark with planar annotations for real-world illusions
Laplacian-based metrics for spurious non-planarity and instability
Parameter-efficient self-distillation to enforce planarity on illusions
🔎 Similar Papers
No similar papers found.
H
Hoang Nguyen
University of Michigan, Ann Arbor
Xiaohao Xu
Xiaohao Xu
Google; University of Michigan, Ann Arbor
Robust Visual IntelligenceAnomaly DetectionVideo&3DComputer VisionRobotics
X
Xiaonan Huang
University of Michigan, Ann Arbor