Distributed Systems Reading Group

Jun 26, 2016
Scalability! But at what COST?

After a long break, we made our first paper Scalability! But at what COST? by Frank McSherry, Michael Isard, and Derek Murray. It is fairly short paper published in HotOS’15 which raises some interesting questions about distributed systems research, and the focus on scalability as the holy grail of performance.
Jan 10, 2014
EPaxos

EPaxos is a leaderless Paxos variant which tries to reduce latencies for a geo-distributed replica group by enabling the client to use the replica with the lowest round-trip latency as the operation leader, and optimistically skipping a round of replica communication by inter-operation conflict detection
Dec 16, 2013
Sinfonia

Sinfonia is a service that allows hosts to share application data in a fault-tolerant, scalable, and consistent manner using a novel mini-transaction primitive. We read this paper because it provides an interesting alternative to message passing for building distributed systems.
Sep 30, 2013
Discretized Streams

Discretized Streams: Fault Tolerant Computing at Scale describes additions to the Spark system to handle streaming data. Compared to other streaming systems, Spark Streaming offers a more robust fault recovery and straggler handling strategies using the Resilient Distributed Dataset (RDD) memory abstraction. In addition to allowing parallel recovery, Spark Streaming is one of the first systems which can incorporate batch and interactive query models all within the same system.
Sep 30, 2013
SPANStore

Several cloud providers provide storage in many data centers globally, and customers can use simple PUTs and GETs to store and retrieve data without dealing with the complexities of the storage infrastructure. However, in reality, every storage system leaves replication across data centers to the application, and although replication across all data centers provides low latency, it is expensive.
Aug 8, 2013
Chain Replication

Chain Replication is a paper from ‘04 by Renesse and Schneider. The system is interesting, because it is a primary-backup system with an unconventional architecture that aimed to achieve high throughput and availability while maintaining strong consistency.
Jul 25, 2013
MDCC

There is a write performance trade off when consistently replicating across multiple data centers due to the high latency when sending messages between data centers (often > 100ms). Existing protocols use forms of two phase commit which incur 3 blocking round trips between data centers.
Jul 18, 2013
CoralCDN

CoralCDN is a system that allows small websites or those with limited resources a method for remaining available in the face of flash-crowds. The system is interesting for a number of reasons: it is a live running system that the public can use, it’s peer to peer, self organizing, and also has a follow-up paper that analyzes the system with five years of hindsight.
Jul 11, 2013
Zookeeper

Zookeeper is a practical system with replicated storage used by Yahoo!. We are interested in understanding what replication protocol Zookeeper uses, why they needed a new replication protocol, what applications/services people build upon it, and what features are required to make such a replicated storage system practical.
Jul 3, 2013
Thialfi

Several of the papers we’ve read recently have focused on sophisticated, generic fault tolerance abstractions based on complex protocols. Thialfi offers a contrast: its approach to fault tolerance is intentionally simple, while at the same time being resilient to arbitrary (halting) failure, including entire data centers. Thialfi’s approach to fault tolerance permeates the design of its abstraction, unlike Raft and VR, which provide general-purpose state machine replication.
Jun 20, 2013
VR Revisited

Viewstamped Replication is a mechanism for providing replication through a Primary / Backup scheme. This paper provides a distilled view of this technique along with several optimizations that can be applied. In particular, this paper focuses solely on the Viewstamped Replication protocol, without looking at any specific implementation or uses.
Jun 13, 2013
Cheriton and Skeen

Though from 1993, in its time this paper sparked some controversy, provoking an impassioned response. We wanted to understand the debate about the question of providing ordering guarantees as part of the network.
Jun 6, 2013
PacificA

In the PacificA paper, the authors describe very clearly how to properly implement a primary/backup replicated storage layer with strong consistency.
May 30, 2013
Spanner

Spanner is a highly distributed, externally consistent database developed by Google. It provides replication and transactions over a geographically distributed set of servers. Spanner uses time bounds, Paxos, and two-phase commit to ensure external consistency.
May 23, 2013
Raft

Raft is a new consensus algorithm that is optimized for “ease of implementation”. Its main purpose is to present a protocol that is more understandable than Paxos, which, for many practitioners, is difficult to implement correctly. Viewstamped Replication is more similar to Raft, however it is far less popular than Paxos, so it is unfortunately not focused on in the paper.