ViT-NeBLa: A Hybrid Vision Transformer and Neural Beer-Lambert Framework for Single-View 3D Reconstruction of Oral Anatomy from Panoramic Radiographs

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Accurate 3D reconstruction of oral anatomical structures from a single panoramic radiograph (PX) remains challenging due to reliance on CBCT registration, image unwrapping, or prior dental-arch knowledge—leading to high radiation exposure, elevated costs, and inherent depth ambiguity. Method: We propose the first end-to-end Vision Transformer (ViT)-enhanced Neural Beer–Lambert framework. Key innovations include non-overlapping horseshoe-shaped ray sampling (reducing computation by 52%), a ViT-CNN hybrid backbone, and learnable hash-based positional encoding. Crucially, our method requires neither CBCT, nor image unwrapping, nor prior dental-arch assumptions—only a single PX view. Results: Our approach achieves state-of-the-art performance in PSNR and SSIM, with superior visual fidelity in reconstructed 3D anatomy. It establishes a new paradigm for low-radiation, cost-effective, and clinically deployable dental 3D diagnosis.

Technology Category

Application Category

📝 Abstract
Dental diagnosis relies on two primary imaging modalities: panoramic radiographs (PX) providing 2D oral cavity representations, and Cone-Beam Computed Tomography (CBCT) offering detailed 3D anatomical information. While PX images are cost-effective and accessible, their lack of depth information limits diagnostic accuracy. CBCT addresses this but presents drawbacks including higher costs, increased radiation exposure, and limited accessibility. Existing reconstruction models further complicate the process by requiring CBCT flattening or prior dental arch information, often unavailable clinically. We introduce ViT-NeBLa, a vision transformer-based Neural Beer-Lambert model enabling accurate 3D reconstruction directly from single PX. Our key innovations include: (1) enhancing the NeBLa framework with Vision Transformers for improved reconstruction capabilities without requiring CBCT flattening or prior dental arch information, (2) implementing a novel horseshoe-shaped point sampling strategy with non-intersecting rays that eliminates intermediate density aggregation required by existing models due to intersecting rays, reducing sampling point computations by $52 %$, (3) replacing CNN-based U-Net with a hybrid ViT-CNN architecture for superior global and local feature extraction, and (4) implementing learnable hash positional encoding for better higher-dimensional representation of 3D sample points compared to existing Fourier-based dense positional encoding. Experiments demonstrate that ViT-NeBLa significantly outperforms prior state-of-the-art methods both quantitatively and qualitatively, offering a cost-effective, radiation-efficient alternative for enhanced dental diagnostics.
Problem

Research questions and friction points this paper is trying to address.

Enables 3D oral reconstruction from 2D panoramic radiographs
Eliminates need for CBCT flattening or dental arch data
Reduces computational costs and radiation exposure risks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid ViT-CNN for superior feature extraction
Horseshoe sampling reduces computations by 52%
Learnable hash encoding enhances 3D representation
🔎 Similar Papers
No similar papers found.