Wave-PDE Nets: Trainable Wave-Equation Layers as an Alternative to Attention

📅 2025-10-05

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses the high computational cost and weak physical interpretability of attention mechanisms and first-order state-space models. We propose Wave-PDE Nets—the first neural architecture that employs a differentiable second-order wave partial differential equation (PDE) as its fundamental layer. Hidden states propagate through a continuous medium governed by learnable spatially varying wave speed and damping fields, (c(x)) and (gamma(x)), enabling global oscillatory dynamics as an alternative to explicit long-range dependency modeling. We theoretically establish that a single Wave-PDE layer possesses universal approximation capability. An FFT-based symplectic spectral solver ensures efficient (O(n log n)) inference. On language and vision benchmarks, Wave-PDE Nets match or surpass Transformer performance while reducing measured inference latency by 30%, peak memory usage by 25%, and improving training stability—achieving both strong physics-informed inductive bias and superior computational efficiency.

Technology Category

Application Category

📝 Abstract

We introduce Wave-PDE Nets, a neural architecture whose elementary operation is a differentiable simulation of the second-order wave equation. Each layer propagates its hidden state as a continuous field through a medium with trainable spatial velocity c(x) and damping γ(x). A symplectic spectral solver based on FFTs realises this propagation in O(nlog n) time. This oscillatory, global mechanism provides a powerful alternative to attention and first-order state-space models. We prove that a single Wave-PDE layer is a universal approximator. On language and vision benchmarks, Wave-PDE Nets match or exceed Transformer performance while demonstrating superior practical efficiency, reducing wall-clock time by up to 30% and peak memory by 25%. Ablation studies confirm the critical role of symplectic integration and a spectral Laplacian for stability and performance. Visualizations of the learned physical parameters reveal that the model learns intuitive strategies for information propagation. These results position Wave-PDE Nets as a computationally efficient and robust architecture with a strong physical inductive bias.

Problem

Research questions and friction points this paper is trying to address.

Proposes trainable wave equation layers as attention alternative

Achieves Transformer performance with improved computational efficiency

Enables stable learning through symplectic integration and spectral methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Differentiable wave equation simulation replaces attention layers

Trainable spatial velocity and damping parameters control propagation

Symplectic spectral solver enables O(n log n) computational efficiency

🔎 Similar Papers

No similar papers found.