🤖 AI Summary
Real-world camera non-idealities—such as fisheye distortion and rolling shutter—degrade visual system performance, primarily due to the scarcity of training data incorporating authentic camera effects. To address this, we propose a 4D Gaussian ray tracing framework: the first method to integrate 4D Gaussian lattice representations with physically accurate ray tracing. Our two-stage pipeline jointly enables dynamic scene reconstruction and realistic camera effect synthesis, supporting concurrent modeling of multiple non-idealities. Compared to state-of-the-art approaches, our method achieves significantly faster rendering while maintaining superior or comparable visual fidelity and substantially narrowing the sim-to-real gap. Furthermore, we introduce the first indoor video benchmark dataset featuring eight dynamic scenes and four camera effects—establishing a standardized evaluation platform for camera-aware video generation.
📝 Abstract
Common computer vision systems typically assume ideal pinhole cameras but fail when facing real-world camera effects such as fisheye distortion and rolling shutter, mainly due to the lack of learning from training data with camera effects. Existing data generation approaches suffer from either high costs, sim-to-real gaps or fail to accurately model camera effects. To address this bottleneck, we propose 4D Gaussian Ray Tracing (4D-GRT), a novel two-stage pipeline that combines 4D Gaussian Splatting with physically-based ray tracing for camera effect simulation. Given multi-view videos, 4D-GRT first reconstructs dynamic scenes, then applies ray tracing to generate videos with controllable, physically accurate camera effects. 4D-GRT achieves the fastest rendering speed while performing better or comparable rendering quality compared to existing baselines. Additionally, we construct eight synthetic dynamic scenes in indoor environments across four camera effects as a benchmark to evaluate generated videos with camera effects.