🤖 AI Summary
This work addresses the lack of efficient numerical methods for simulating Brownian motion and performing Langevin sampling on metric graphs. We propose the first time-step-split Euler–Maruyama discretization scheme, which analytically decouples intra-edge diffusion from vertex-jump dynamics to design a splitting strategy that preserves both convergence and jump-probability consistency—thereby overcoming the restrictive time-step constraints inherent in conventional finite-volume methods. Leveraging custom CUDA kernels, we achieve highly parallel GPU acceleration: up to 8,000× speedup on star graphs and 1,500× over DuMuX on realistic cortical vascular networks, while enabling significantly larger stable time steps. To our knowledge, this is the first scalable, high-accuracy, and computationally efficient numerical framework for Brownian motion simulation and sampling on metric graphs.
📝 Abstract
Metric graphs are structures obtained by associating edges in a standard graph with segments of the real line and gluing these segments at the vertices of the graph. The resulting structure has a natural metric that allows for the study of differential operators and stochastic processes on the graph. Brownian motions in these domains have been extensively studied theoretically using their generators. However, less work has been done on practical algorithms for simulating these processes. We introduce the first algorithm for simulating Brownian motions on metric graphs through a timestep splitting Euler-Maruyama-based discretization of their corresponding stochastic differential equation. By applying this scheme to Langevin diffusions on metric graphs, we also obtain the first algorithm for sampling on metric graphs. We provide theoretical guarantees on the number of timestep splittings required for the algorithm to converge to the underlying stochastic process. We also show that the exit probabilities of the simulated particle converge to the vertex-edge jump probabilities of the underlying stochastic differential equation as the timestep goes to zero. Finally, since this method is highly parallelizable, we provide fast, memory-aware implementations of our algorithm in the form of custom CUDA kernels that are up to ~8000x faster than a GPU implementation using PyTorch on simple star metric graphs. Beyond simple star graphs, we benchmark our algorithm on a real cortical vascular network extracted from a DuMuX tissue-perfusion model for tracer transport. Our algorithm is able to run stable simulations with timesteps significantly larger than the stable limit of the finite volume method used in DuMuX while also achieving speedups of up to ~1500x.