Sharp Monocular View Synthesis in Less Than a Second

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

264K/year

🤖 AI Summary

This work addresses the problem of reconstructing metric-scale, photorealistic 3D scenes from a single image and enabling real-time novel-view synthesis. We propose the first single-pass feedforward neural network that directly regresses a metric-scale 3D Gaussian representation of the scene (inference <1 s), coupled with differentiable Gaussian voxel rendering for millisecond-level high-resolution rendering under realistic camera trajectories. The method requires no multi-view inputs, depth supervision, or explicit scene priors, and is trained in a purely self-supervised manner. Key contributions include: (i) the first metric reconstruction achieving zero-shot generalization from a single image; and (ii) the unified achievement of real-time rendering, photorealism, and absolute scale consistency. Quantitatively, our approach reduces LPIPS by 25–34% and DISTS by 21–43% across multiple benchmarks, accelerates synthesis by three orders of magnitude over state-of-the-art methods, and supports cross-dataset zero-shot transfer.

Technology Category

Application Category

📝 Abstract

We present SHARP, an approach to photorealistic view synthesis from a single image. Given a single photograph, SHARP regresses the parameters of a 3D Gaussian representation of the depicted scene. This is done in less than a second on a standard GPU via a single feedforward pass through a neural network. The 3D Gaussian representation produced by SHARP can then be rendered in real time, yielding high-resolution photorealistic images for nearby views. The representation is metric, with absolute scale, supporting metric camera movements. Experimental results demonstrate that SHARP delivers robust zero-shot generalization across datasets. It sets a new state of the art on multiple datasets, reducing LPIPS by 25-34% and DISTS by 21-43% versus the best prior model, while lowering the synthesis time by three orders of magnitude. Code and weights are provided at https://github.com/apple/ml-sharp

Problem

Research questions and friction points this paper is trying to address.

Generates photorealistic novel views from a single image

Achieves rapid 3D scene reconstruction in under a second

Enables real-time rendering with metric scale and generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single image to 3D Gaussian scene representation

Fast feedforward network for under one second synthesis

Metric 3D representation enabling real-time photorealistic rendering

🔎 Similar Papers

DynPoint: Dynamic Neural Point For View Synthesis