This book presents the most important fault-tolerant distributed programming abstractions and their associated distributed algorithms, in particular in terms of reliable communication and agreement, which lie at the heart of nearly all distributed applications. These programming abstractions, distributed objects or services, allow software designers and programmers to cope with asynchrony and the most important types of failures such as process crashes, message losses, and malicious behaviors of computing entities, widely known under the term "Byzantine fault-tolerance". The author introduces these notions in an incremental manner, starting from a clear specification, followed by algorithms which are first described intuitively and then proved correct. The book also presents impossibility results in classic distributed computing models, along with strategies, mainly failure detectors and randomization, that allow us to enrich these models. In this sense, the book constitutes an introduction to the science of distributed computing, with applications in all domains of distributed systems, such as cloud computing and blockchains. Each chapter comes with exercises and bibliographic notes to help the reader approach, understand, and master the fascinating field of fault-tolerant distributed computing.
Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. Considering the uncertainty created by asynchrony and process crash failures in the context of message-passing systems, the book focuses on the main abstractions that one has to understand and master in order to be able to produce software with guaranteed properties. These fundamental abstractions are communication abstractions that allow the processes to communicate consistently (namely the register abstraction and the reliable broadcast abstraction), and the consensus agreement abstractions that allows them to cooperate despite failures. As they give a precise meaning to the words "communicate" and "agree" despite asynchrony and failures, these abstractions allow distributed programs to be designed with properties that can be stated and proved. Impossibility results are associated with these abstractions. Hence, in order to circumvent these impossibilities, the book relies on the failure detector approach, and, consequently, that approach to fault-tolerance is central to the book. Table of Contents: List of Figures / The Atomic Register Abstraction / Implementing an Atomic Register in a Crash-Prone Asynchronous System / The Uniform Reliable Broadcast Abstraction / Uniform Reliable Broadcast Abstraction Despite Unreliable Channels / The Consensus Abstraction / Consensus Algorithms for Asynchronous Systems Enriched with Various Failure Detectors / Constructing Failure Detectors
The research described in this report is presented in six parts: 1) On Interprocess Communication studies interprocess communication without assuming any lower-level communication primitives. A formalism is developed for reasoning about concurrent systems that does not assume an atomic grain of action; 2) The Intersecting Broadcast Machine is a novel array processor architecture, capable of processing efficiently programs whose arbitrary or complex structure would make them difficult to map onto conventional array processors. The architecture also supports fault-tolerant operation: 3) Broadcast Protocols for Distributed Systems considers how the broadcast character of communications media such as Ethernet and packet radio can be exploited to yield reliable communication with very little overhead; 4) Extending Interval Logic to Real Time Systems presents a technique for the formal expression of the real-time constraints that are critical to the specification of fault-tolerant distributed systems; 5) Consistency of Replicated Information in Multichannel Fault Tolerant Systems considers the possibility of using similar, but not identical, processing in the replicas of a fault tolerant system. Conventional fault tolerant systems using replicate processing require the replicas to be identical, so that they can be compared by exact match algorithms. This exact replication increases the risk that a common fault will affect all replicas and cause system failure; and 6) Experimental Implementation and Evaluation of the TRANS Broadcast Protocol describes an implementation and evaluation of the broadcast protocol outlined in Part III. Keywords: Multiprocessors. (KR).
Understanding distributed computing is not an easy task. This is due to the many facets of uncertainty one has to cope with and master in order to produce correct distributed software. A previous book Communication and Agreement Abstraction for Fault-tolerant Asynchronous Distributed Systems (published by Morgan & Claypool, 2010) was devoted to the problems created by crash failures in asynchronous message-passing systems. The present book focuses on the way to cope with the uncertainty created by process failures (crash, omission failures and Byzantine behavior) in synchronous message-passing systems (i.e., systems whose progress is governed by the passage of time). To that end, the book considers fundamental problems that distributed synchronous processes have to solve. These fundamental problems concern agreement among processes (if processes are unable to agree in one way or another in presence of failures, no non-trivial problem can be solved). They are consensus, interactive consistency, k-set agreement and non-blocking atomic commit. Being able to solve these basic problems efficiently with provable guarantees allows applications designers to give a precise meaning to the words "cooperate" and "agree" despite failures, and write distributed synchronous programs with properties that can be stated and proved. Hence, the aim of the book is to present a comprehensive view of agreement problems, algorithms that solve them and associated computability bounds in synchronous message-passing distributed systems. Table of Contents: List of Figures / Synchronous Model, Failure Models, and Agreement Problems / Consensus and Interactive Consistency in the Crash Failure Model / Expedite Decision in the Crash Failure Model / Simultaneous Consensus Despite Crash Failures / From Consensus to k-Set Agreement / Non-Blocking Atomic Commit in Presence of Crash Failures / k-Set Agreement Despite Omission Failures / Consensus Despite Byzantine Failures / Byzantine Consensus in Enriched Models
Fault tolerance is an approach by which reliability of a computer system can be increased beyond what can be achieved by traditional methods. Comprehensive and self-contained, this book explores the information available on software supported fault tolerance techniques, with a focus on fault tolerance in distributed systems.
Both authors have taught the course of “Distributed Systems” for many years in the respective schools. During the teaching, we feel strongly that “Distributed systems” have evolved from traditional “LAN” based distributed systems towards “Internet based” systems. Although there exist many excellent textbooks on this topic, because of the fast development of distributed systems and network programming/protocols, we have difficulty in finding an appropriate textbook for the course of “distributed systems” with orientation to the requirement of the undergraduate level study for today’s distributed technology. Specifically, from - to-date concepts, algorithms, and models to implementations for both distributed system designs and application programming. Thus the philosophy behind this book is to integrate the concepts, algorithm designs and implementations of distributed systems based on network programming. After using several materials of other textbooks and research books, we found that many texts treat the distributed systems with separation of concepts, algorithm design and network programming and it is very difficult for students to map the concepts of distributed systems to the algorithm design, prototyping and implementations. This book intends to enable readers, especially postgraduates and senior undergraduate level, to study up-to-date concepts, algorithms and network programming skills for building modern distributed systems. It enables students not only to master the concepts of distributed network system but also to readily use the material introduced into implementation practices.
This book presents a comprehensive exploration of the practical issues, tested techniques, and accepted theory for developing fault tolerant systems. It is a ready reference to work already done in the field, with new approaches devised by the authors.
Distributed computing over multiple nodes has been emerging in practical systems. Comparing to the classical single node computation, distributed computing offers higher computing speeds over large data. However, the computation delay of the overall distributed system is controlled by its slower nodes, i.e., straggler nodes. Furthermore, if we want to run iterative algorithms such as gradient descent based algorithms communication cost becomes a bottleneck. Therefore, it is important to design coded strategies while they are prone to these straggler nodes, at the same time they are communication-efficient. Recent work has developed coding theoretic approaches to add redundancy to distributed matrix-vector multiplications with the goal of speeding up the computation by mitigating the straggler effect in distributed computing. First, we consider the case where the matrix comes from a small (e.g., binary) alphabet, where a variant of a popular method called the ``Four-Russians method'' is known to have significantly lower computational complexity as compared with the usual matrix-vector multiplication algorithm. We develop novel code constructions that are applicable to binary matrix-vector multiplication {via a variant of the Four-Russians method called the Mailman algorithm}. Specifically, in our constructions, the encoded matrices have a low alphabet that ensures lower computational complexity, as well as good straggler tolerance. We also present a trade-off between the communication and computation cost of distributed coded matrix-vector multiplication {for general, possibly non-binary, matrices.} Second, we provide novel coded computation strategies, called MatDot, for distributed matrix-matrix products that outperform the recent ``Polynomial code'' constructions in recovery threshold, i.e., the required number of successful workers at the cost of higher computation cost per worker and higher communication cost from each worker to the fusion node. We also demonstrate a novel coding technique for multiplying $n$ matrices ($n \geq 3$) using ideas from MatDot codes. Third, we introduce the idea of \emph{cross-iteration coded computing}, an approach to reducing communication costs for a large class of distributed iterative algorithms involving linear operations, including gradient descent and accelerated gradient descent for quadratic loss functions. The state-of-the-art approach for these iterative algorithms involves performing one iteration of the algorithm per round of communication among the nodes. In contrast, our approach performs multiple iterations of the underlying algorithm in a single round of communication by incorporating some redundancy storage and computation. Our algorithm works in the master-worker setting with the workers storing carefully constructed linear transformations of input matrices and using these matrices in an iterative algorithm, with the master node inverting the effect of these linear transformations. In addition to reduced communication costs, a trivial generalization of our algorithm also includes resilience to stragglers and failures as well as Byzantine worker nodes. We also show a special case of our algorithm that trades-off between communication and computation. The degree of redundancy of our algorithm can be tuned based on the amount of communication and straggler resilience required. Moreover, we also describe a variant of our algorithm that can flexibly recover the results based on the degree of straggling in the worker nodes. The variant allows for the performance to degrade gracefully as the number of successful (non-straggling) workers is lowered. Communication overhead is one of the key challenges that hinders the scalability of distributed optimization algorithms to train large neural networks. In recent years, there has been a great deal of research to alleviate communication cost by compressing the gradient vector or using local updates and periodic model averaging. Next direction in this thesis, is to advocate the use of redundancy towards communication-efficient distributed stochastic algorithms for non-convex optimization. In particular, we, both theoretically and practically, show that by properly infusing redundancy to the training data with model averaging, it is possible to significantly reduce the number of communication rounds. To be more precise, we show that redundancy reduces residual error in local averaging, thereby reaching the same level of accuracy with fewer rounds of communication as compared with previous algorithms. Empirical studies on CIFAR10, CIFAR100 and ImageNet datasets in a distributed environment complement our theoretical results; they show that our algorithms have additional beneficial aspects including tolerance to failures, as well as greater gradient diversity. Next, we study local distributed SGD, where data is partitioned among computation nodes, and the computation nodes perform local updates with periodically exchanging the model among the workers to perform averaging. While local SGD is empirically shown to provide promising results, a theoretical understanding of its performance remains open. We strengthen convergence analysis for local SGD, and show that local SGD can be far less expensive and applied far more generally than current theory suggests. Specifically, we show that for loss functions that satisfy the \pl~condition, $O((pT)^{1/3})$ rounds of communication suffice to achieve a linear speed up, that is, an error of $O(1/pT)$, where $T$ is the total number of model updates at each worker. This is in contrast with previous work which required higher number of communication rounds, as well as was limited to strongly convex loss functions, for a similar asymptotic performance. We also develop an adaptive synchronization scheme that provides a general condition for linear speed up. We also validate the theory with experimental results, running over AWS EC2 clouds and an internal GPU cluster. In final section, we focus on Federated learning where communication cost is often a critical bottleneck to scale up distributed optimization algorithms to collaboratively learn a model from millions of devices with potentially unreliable or limited communication and heterogeneous data distributions. Two notable trends to deal with the communication overhead of federated algorithms are \emph{gradient compression} and \emph{local computation with periodic communication}. Despite many attempts, characterizing the relationship between these two approaches has proven elusive. We address this by proposing a set of algorithms with periodical compressed (quantized or sparsified) communication and analyze their convergence properties in both homogeneous and heterogeneous local data distributions settings. For the homogeneous setting, our analysis improves existing bounds by providing tighter convergence rates for both \emph{strongly convex} and \emph{non-convex} objective functions. To mitigate data heterogeneity, we introduce a \emph{local gradient tracking} scheme and obtain sharp convergence rates that match the best-known communication complexities without compression for convex, strongly convex, and nonconvex settings. We complement our theoretical results by demonstrating the effectiveness of our proposed methods on real-world datasets.
The growing complexity of modern software systems makes it increasingly difficult to ensure the overall dependability of software-intensive systems. Mastering system complexity requires design techniques that support clear thinking and rigorous validation and verification. Formal design methods together with fault-tolerant design techniques help to achieve this. Therefore, there is a clear need for methods that enable rigorous modeling and the development of complex fault-tolerant systems. This book is an outcome of the workshop on Methods, Models and Tools for Fault Tolerance, MeMoT 2007, held in conjunction with the 6th international conference on Integrated Formal Methods, iFM 2007, in Oxford, UK, in July 2007. The authors of the best workshop papers were asked to enhance and expand their work, and a number of well-established researchers working in the area contributed invited chapters in addition. From the 15 refereed and revised papers presented, 12 are versions reworked from the workshop and 3 papers are invited. The articles are organized in four topical sections on: formal reasoning about fault-tolerant systems and protocols; fault tolerance: modelling in B; fault tolerance in system development process; and fault-tolerant applications.