Parallel Dual-Numbers Reverse AD

📅 2022-07-07
📈 Citations: 9
Influential: 2
📄 PDF
🤖 AI Summary
This work addresses the challenge of supporting task parallelism in reverse-mode automatic differentiation (AD) for functional languages—particularly Haskell—where conventional dual-number-based reverse AD struggles with parallel execution. We present the first parallel reverse AD implementation targeting the standard Haskell98 subset, requiring no explicit serialization of computation graphs. Our approach integrates dual-number representation, linear decomposition, and mutable array optimizations to enable task-level parallelism in derivative computation while preserving pure functional semantics. Crucially, a compile-time functional transformation retains the original program’s parallel structure, eliminating the sequential dependencies inherent in traditional reverse AD. Contributions include: (i) the first extension of dual-number reverse AD to task-parallel settings; (ii) theoretically optimal asymptotic complexity for both time and space; and (iii) empirical validation demonstrating that derivative computation achieves parallel efficiency and scalability equivalent to the original program.
📝 Abstract
Where dual-numbers forward-mode automatic differentiation (AD) pairs each scalar value with its tangent value, dual-numbers reverse-mode AD attempts to achieve reverse AD using a similarly simple idea: by pairing each scalar value with a backpropagator function. Its correctness and efficiency on higher-order input languages have been analysed by Brunel, Mazza and Pagani, but this analysis used a custom operational semantics for which it is unclear whether it can be implemented efficiently. We take inspiration from their use of linear factoring to optimise dual-numbers reverse-mode AD to an algorithm that has the correct complexity and enjoys an efficient implementation in a standard functional language with support for mutable arrays, such as Haskell. Aside from the linear factoring ingredient, our optimisation steps consist of well-known ideas from the functional programming community. We demonstrate the use of our technique by providing a practical implementation that differentiates most of Haskell98. Where previous work on dual numbers reverse AD has required sequentialisation to construct the reverse pass, we demonstrate that we can apply our technique to task-parallel source programs and generate a task-parallel derivative computation.
Problem

Research questions and friction points this paper is trying to address.

Optimizing dual-numbers reverse-mode AD for efficient implementation.
Achieving correct complexity in functional programming languages.
Enabling task-parallel derivative computation in source programs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized dual-numbers reverse-mode AD algorithm
Efficient implementation in functional languages
Task-parallel derivative computation generation
🔎 Similar Papers
No similar papers found.