Winter 2013
UC Berkeley
This course provides basic theoretical and practical foundations of distributed systems. Students learn about system models, safety and liveness of protocols, different failure models, reliable group communication abstractions, and more. It utilizes a textbook and additional research paper-based lectures.
In the past decade evermore applications and services, which previously were running on local PCs, have moved to the Internet, in data centers, accessible through the Web. This puts distributed systems at the center of many of s application architectures. Distributed systems (or distributed computing) concerns systems in which many nodes (machines) solve a common problem, using message passing over a network that connects those nodes. The aim of this course is to establish familiarity with the basic theoretical and practical foundations of distributed systems.
Distributed computing is challenging due to two fundamental problems: (i) partial-failures, and (ii) asynchrony. Partial failures means that parts of the system (network or machines) can be faulty, but it is desirable for the rest of the system to function correctly. Asynchrony is due to the variance in the time it takes to send messages between computers and the operating speed of different computers. It is therefore desirable to make the system function correctly while events are happening asynchronously.
Over the years, many recurring problems have been studied with respect to the two aforementioned challenges. Furthermore, many abstractions have been proposed that simplify dealing with these two challenges when building distributed systems. In this course we will study many of these problems and abstractions, including the following: today
No data.
No data.
We will loosely follow the following textbook, but also have additional lectures based on research papers:
Introduction to Reliable and Secure Distributed Programming, C. Cachin, R. Guerraoui, L. Rodrigues, Springer, 2011.