A new dataset and comparison for multi-camera frame synthesis

📅 2025-08-12

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing frame interpolation and novel view synthesis methods suffer from distributional mismatches in training data—frame interpolation focuses on temporal motion from a single camera, while view synthesis targets stereo depth estimation—preventing fair cross-task comparison. To address this, we introduce the first dense linear camera array dataset explicitly designed for multi-view frame generation, enabling unified evaluation across both temporal and spatial dimensions and filling a critical gap in cross-modal video generation benchmarks. Leveraging this dataset, we conduct a systematic benchmark of 3D Gaussian Splatting, classical optical flow-based methods, and deep learning-based frame interpolation algorithms. Results reveal a performance reversal: on real-world scenes, traditional methods outperform deep learning approaches by ~3.5 dB PSNR; conversely, on synthetic scenes, 3D Gaussian Splatting surpasses others by nearly 5 dB. This work establishes a new empirical standard for evaluating video generation models.

Technology Category

Application Category

📝 Abstract

Many methods exist for frame synthesis in image sequences but can be broadly categorised into frame interpolation and view synthesis techniques. Fundamentally, both frame interpolation and view synthesis tackle the same task, interpolating a frame given surrounding frames in time or space. However, most frame interpolation datasets focus on temporal aspects with single cameras moving through time and space, while view synthesis datasets are typically biased toward stereoscopic depth estimation use cases. This makes direct comparison between view synthesis and frame interpolation methods challenging. In this paper, we develop a novel multi-camera dataset using a custom-built dense linear camera array to enable fair comparison between these approaches. We evaluate classical and deep learning frame interpolators against a view synthesis method (3D Gaussian Splatting) for the task of view in-betweening. Our results reveal that deep learning methods do not significantly outperform classical methods on real image data, with 3D Gaussian Splatting actually underperforming frame interpolators by as much as 3.5 dB PSNR. However, in synthetic scenes, the situation reverses -- 3D Gaussian Splatting outperforms frame interpolation algorithms by almost 5 dB PSNR at a 95% confidence level.

Problem

Research questions and friction points this paper is trying to address.

Compare frame interpolation and view synthesis methods

Develop multi-camera dataset for fair evaluation

Assess performance on real vs synthetic image data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed dense linear camera array dataset

Compared frame interpolation and view synthesis

Evaluated classical vs deep learning methods

🔎 Similar Papers

No similar papers found.