π€ AI Summary
This work proposes modeling the forward pass of feedforward ReLU neural networks as a cellular sheaf, where neurons correspond to vertices and computational steps are encoded as restriction maps on edges. Leveraging sheaf cohomology theory, the study establishes for the first time that forward propagation is equivalent to the unique harmonic extension over the sheaf. Building on this insight, the authors introduce a bidirectional information propagation mechanism grounded in the sheaf heat equation, enabling local error minimization and edge-wise diagnostics without backpropagation. Experiments on synthetic tasks demonstrate that the sheaf heat equation converges exponentially to the forward output, validating the approachβs feasibility and revealing quantitative scaling behaviors consistent with theoretical predictions.
π Abstract
We construct a cellular sheaf from any feedforward ReLU neural network by placing one vertex for each intermediate quantity in the forward pass and encoding each computational step - affine transformation, activation, output - as a restriction map on an edge. The restricted coboundary operator on the free coordinates is unitriangular, so its determinant is $1$ and the restricted Laplacian is positive definite for every activation pattern. It follows that the relative cohomology vanishes and the forward pass output is the unique harmonic extension of the boundary data. The sheaf heat equation converges exponentially to this output despite the state-dependent switching introduced by piecewise linear activations. Unlike the forward pass, the heat equation propagates information bidirectionally across layers, enabling pinned neurons that impose constraints in both directions, training through local discrepancy minimization without a backward pass, and per-edge diagnostics that decompose network behavior by layer and operation type. We validate the framework experimentally on small synthetic tasks, confirming the convergence theorems and demonstrating that sheaf-based training, while not yet competitive with stochastic gradient descent, obeys quantitative scaling laws predicted by the theory.