🤖 AI Summary
This paper addresses the challenge of end-to-end differentiability in complex programs featuring nontrivial control flow and data structures. To this end, it introduces a probabilistic programming paradigm for differentiation, unifying optimization and probabilistic inference within a differentiable programming framework. Methodologically, it transcends conventional automatic differentiation (AD) by establishing, for the first time, a theoretical link between differentiability of control flow/data structures and uncertainty modeling—integrating AD, graphical models, convex optimization, and Bayesian inference into a cohesive differentiable program modeling framework. Key contributions include: (1) revealing that differentiable programming is fundamentally probabilistic programming—not merely gradient computation; (2) proposing the “program-as-model” design principle; and (3) establishing the first comprehensive knowledge system spanning theory, design, and applications, enabling the development of differentiable software infrastructure for large language models and foundation models.
📝 Abstract
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming. This new programming paradigm enables end-to-end differentiation of complex computer programs (including those with control flows and data structures), making gradient-based optimization of program parameters possible. As an emerging paradigm, differentiable programming builds upon several areas of computer science and applied mathematics, including automatic differentiation, graphical models, optimization and statistics. This book presents a comprehensive review of the fundamental concepts useful for differentiable programming. We adopt two main perspectives, that of optimization and that of probability, with clear analogies between the two. Differentiable programming is not merely the differentiation of programs, but also the thoughtful design of programs intended for differentiation. By making programs differentiable, we inherently introduce probability distributions over their execution, providing a means to quantify the uncertainty associated with program outputs.