🤖 AI Summary
Existing visual representation methods often suffer from spectral distortion and rendering inefficiency due to reliance on frequency-guided heuristics or complex neural decoding. This paper introduces WIPES—a wavelet-based universal continuous visual primitive—marking the first integration of wavelet transforms into visual primitive design. Leveraging the space-frequency localization property of wavelets, WIPES jointly models high-frequency details and low-frequency structural components without neural decoding. It employs differentiable rasterization for efficient, flexible frequency-domain control and real-time rendering. In 2D image representation and 5D/6D novel-view synthesis tasks, WIPES achieves superior rendering quality compared to Gaussian primitives and outperforms implicit neural representations in both inference speed and fidelity. By unifying spectral efficiency with geometric expressiveness, WIPES significantly advances the efficiency and representational capacity of continuous visual representations.
📝 Abstract
Pursuing a continuous visual representation that offers flexible frequency modulation and fast rendering speed has recently garnered increasing attention in the fields of 3D vision and graphics. However, existing representations often rely on frequency guidance or complex neural network decoding, leading to spectrum loss or slow rendering. To address these limitations, we propose WIPES, a universal Wavelet-based vIsual PrimitivES for representing multi-dimensional visual signals. Building on the spatial-frequency localization advantages of wavelets, WIPES effectively captures both the low-frequency "forest" and the high-frequency "trees." Additionally, we develop a wavelet-based differentiable rasterizer to achieve fast visual rendering. Experimental results on various visual tasks, including 2D image representation, 5D static and 6D dynamic novel view synthesis, demonstrate that WIPES, as a visual primitive, offers higher rendering quality and faster inference than INR-based methods, and outperforms Gaussian-based representations in rendering quality.