🤖 AI Summary
Existing time-complexity metrics for atomic snapshot protocols in asynchronous message-passing systems exhibit inconsistency when applied to long-running asynchronous algorithms, hindering fair cross-algorithm comparison of latency guarantees.
Method: The authors propose a unified operational latency metric tailored to asynchronous environments and design the first atomic snapshot protocol that simultaneously achieves theoretical optimality and practical robustness.
Contribution/Results: The protocol attains Ω(1) optimal latency under contention-free execution; maintains constant latency under contention; and, in fault-tolerant settings, exhibits worst-case latency linear in the number of actively concurrent faults while achieving constant amortized latency—near-optimal under adversarial scheduling. This work establishes a cohesive analytical framework for latency characterization of long-running asynchronous algorithms and introduces a new paradigm for time-complexity evaluation of distributed primitives.
📝 Abstract
This paper introduces a novel, fast atomic-snapshot protocol for asynchronous message-passing systems. In the process of defining what ``fast'' means exactly, we spot a few interesting issues that arise when conventional time metrics are applied to long-lived asynchronous algorithms. We reveal some gaps in latency claims made in earlier work on snapshot algorithms, which hamper their comparative time-complexity analysis. We then come up with a new unifying time-complexity metric that captures the latency of an operation in an asynchronous, long-lived implementation. This allows us to formally grasp latency improvements of our atomic-snapshot algorithm with respect to the state-of-the-art protocols: optimal latency in fault-free runs without contention, short constant latency in fault-free runs with contention, the worst-case latency proportional to the number of active concurrent failures, and constant, close to optimal, amortized latency.