A Promising Future: Omission Failures in Choreographic Programming

📅 2017-12-14
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of modeling realistic communication failures—such as send/receive omissions and network packet loss—in distributed choreographic programming. It introduces the first choreographic theory supporting omission-failure semantics. The core methodological innovation is the decoupling of send and receive primitives, enabling independent specification of fault-tolerance policies by communicating endpoints while preserving endpoint behavior consistency and eliminating synchronization dependencies over unreliable channels. The theory supports static robustness analysis and fault-recovery programming for at-most-once and exactly-once message delivery guarantees. Implemented in the Choral language, the theory formally models and verifies classic distributed protocols—including two-phase commit—demonstrating its expressive power, analytical tractability, and engineering practicality.
📝 Abstract
Choreographic programming promises a simple approach to the coding of concurrent and distributed systems: write the collective communication behaviour of a system of processes as a choreography, and then the programs for these processes are automatically compiled by a provably-correct procedure known as endpoint projection. While this promise prompted substantial research, a theory that can deal with realistic communication failures in a distributed network remains elusive. In this work, we provide the first theory of choreographic programming that addresses realistic communication failures taken from the literature of distributed systems: processes can send or receive fewer messages than they should (send and receive omission), and the network can fail at transporting messages (omission failure). Our theory supports the programming of strategies for failure recovery, and a novel static analysis (called robustness) to check for delivery guarantees (at-most-once and exactly-once). Our key technical innovation is a deconstruction of the usual communication primitive in choreographies to allow for independent implementations of the send and receive actions of a communication, while still retaining the static guarantee that these actions will correlate correctly (the essence of choreographic programming). This has two main benefits. First, each side of a communication can adopt its own failure recovery strategy, as in realistic protocols. Second, initiating new communications does not require any (unrealistic) synchronisation over unreliable channels: senders and receivers agree by construction on how each message should be identified. We validate our design via a series of examples -- including two-phase commit, which so far eluded choreographic programming -- and an implementation of our ideas in the choreographic programming language Choral.
Problem

Research questions and friction points this paper is trying to address.

Address realistic communication failures in choreographic programming
Develop strategies for failure recovery in distributed systems
Enable independent send and receive actions with correct correlation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deconstructs communication primitives
Supports independent failure recovery
Introduces novel static analysis
🔎 Similar Papers
No similar papers found.