🤖 AI Summary
Existing frame interpolation and novel view synthesis methods suffer from distributional mismatches in training data—frame interpolation focuses on temporal motion from a single camera, while view synthesis targets stereo depth estimation—preventing fair cross-task comparison. To address this, we introduce the first dense linear camera array dataset explicitly designed for multi-view frame generation, enabling unified evaluation across both temporal and spatial dimensions and filling a critical gap in cross-modal video generation benchmarks. Leveraging this dataset, we conduct a systematic benchmark of 3D Gaussian Splatting, classical optical flow-based methods, and deep learning-based frame interpolation algorithms. Results reveal a performance reversal: on real-world scenes, traditional methods outperform deep learning approaches by ~3.5 dB PSNR; conversely, on synthetic scenes, 3D Gaussian Splatting surpasses others by nearly 5 dB. This work establishes a new empirical standard for evaluating video generation models.
📝 Abstract
Many methods exist for frame synthesis in image sequences but can be broadly categorised into frame interpolation and view synthesis techniques. Fundamentally, both frame interpolation and view synthesis tackle the same task, interpolating a frame given surrounding frames in time or space. However, most frame interpolation datasets focus on temporal aspects with single cameras moving through time and space, while view synthesis datasets are typically biased toward stereoscopic depth estimation use cases. This makes direct comparison between view synthesis and frame interpolation methods challenging. In this paper, we develop a novel multi-camera dataset using a custom-built dense linear camera array to enable fair comparison between these approaches. We evaluate classical and deep learning frame interpolators against a view synthesis method (3D Gaussian Splatting) for the task of view in-betweening. Our results reveal that deep learning methods do not significantly outperform classical methods on real image data, with 3D Gaussian Splatting actually underperforming frame interpolators by as much as 3.5 dB PSNR. However, in synthetic scenes, the situation reverses -- 3D Gaussian Splatting outperforms frame interpolation algorithms by almost 5 dB PSNR at a 95% confidence level.