🤖 AI Summary
Neural Radiance Fields (NeRF) struggle to reconstruct industrial-grade fine-scale structures—such as sub-micron defects and chip-level micro-topography—under camera pose-free settings due to insufficient geometric fidelity at microscopic scales.
Method: We propose the first NeRF framework natively supporting joint optimization of multi-scale images. Our approach introduces learnable scale parameters and a wide-field-of-view-guided pose initialization strategy, integrates a focal-length-adaptive scaling mechanism based on an improved pinhole camera model, and employs cropping-aware pose pre-alignment to ensure cross-scale geometric consistency.
Results: Experiments on both synthetic and real-world datasets demonstrate substantial improvements: +28% PSNR, +10% SSIM, and −222% LPIPS over state-of-the-art methods. To our knowledge, this is the first method achieving unified global geometric fidelity and micron-level detail reconstruction without pose supervision.
📝 Abstract
Neural Radiance Fields (NeRF) methods excel at 3D reconstruction from multiple 2D images, even those taken with unknown camera poses. However, they still miss the fine-detailed structures that matter in industrial inspection, e.g., detecting sub-micron defects on a production line or analyzing chips with Scanning Electron Microscopy (SEM). In these scenarios, the sensor resolution is fixed and compute budgets are tight, so the only way to expose fine structure is to add zoom-in images; yet, this breaks the multi-view consistency that pose-free NeRF training relies on. We propose Multi-Zoom Enhanced NeRF (MZEN), the first NeRF framework that natively handles multi-zoom image sets. MZEN (i) augments the pin-hole camera model with an explicit, learnable zoom scalar that scales the focal length, and (ii) introduces a novel pose strategy: wide-field images are solved first to establish a global metric frame, and zoom-in images are then pose-primed to the nearest wide-field counterpart via a zoom-consistent crop-and-match procedure before joint refinement. Across eight forward-facing scenes$unicode{x2013}$synthetic TCAD models, real SEM of micro-structures, and BLEFF objects$unicode{x2013}$MZEN consistently outperforms pose-free baselines and even high-resolution variants, boosting PSNR by up to $28 %$, SSIM by $10 %$, and reducing LPIPS by up to $222 %$. MZEN, therefore, extends NeRF to real-world factory settings, preserving global accuracy while capturing the micron-level details essential for industrial inspection.