CoralCDN is a system that allows small websites or those with limited resources a method for remaining available in the face of flash-crowds. The system is interesting for a number of reasons: it is a live running system that the public can use, it’s peer to peer, self organizing, and also has a follow-up paper that analyzes the system with five years of hindsight.
What is CoralCDN
CoralCDN is a content distribution network (CDN) built on top of the Coral indexing service. The system can be described in its component parts: Coral DNS services, Coral HTTP proxy, and the Coral indexing service.
Many of CoralCDN’s processes attempt to preserve locality in order to keep latency low. This starts with the DNS lookup by the end user’s browser. CoralCDN assumes that the resolver for the user’s browser is close to the end user on the network. It then picks a Coral HTTP proxy close to the source address of the DNS request.
Specifically, CoralCDN keeps track of the latencies between the proxies and the user and creates levels. The Coral DNS server can get a round trip time to the end user and only provide listings to proxies that fall within a given level.
The DNS servers are also responsible for ensuring that the HTTP proxies that are returned to the end user are available, since a bad HTTP proxy will fail the user’s request. The DNS servers can do this by synchronously checking a proxy’s status with an RPC prior to releasing the DNS response.
The HTTP Proxy
The driving goal of the system is to shield origin servers from request stampedes. The Coral proxies therefore attempt to find the content within CoralCDN before going to the origin server. Additionally, when a proxy begins retrieval from the origin server, it inserts a short lived record into the Coral index so any additional servers that want the object will not simultaneously contact the origin.
The Indexing Service
One of the authors’ stated contributions of this work is the indexing system, including a distributed sloppy hash table (DSHT). Each node is given a 160 bit ID and holds key/value pairs with keys that are “close” to the node’s ID. Additionally, each node maintains a routing table to other nodes. On a put or get, the request can be routed to successively closer nodes to toward the closest node.
The service is considered sloppy because a key/value is not always stored at the closest node. The put process is completed in two phases. The first is a collection of hops to get to the closest node and forms a stack from the list of nodes. This phase ends when the storing node reaches the closest node or when it reaches one that is full and loaded. A full node is one that already stores values for that key. A loaded node is one that is receiving more than requests per minute. In the second phase, the storing node attempts to place the value at the node on top of the stack. If that store fails for some reason, the storer pops the stack and tries again.
Nodes are arranged in clusters based on their average pairwise latency. If that latency is below a certain threshold, the cluster is said to be part of a specific level. A node can be part of multiple clusters but uses the same ID in each. This clustering allows a sequential search for a target value, beginning with low latency “close” nodes and only proceeding to slower levels if needed.
The idea of levels was interesting mostly because the latency thresholds allow the clusters to be self-organizing. Clearly, locality is important to a CDN. It does seem however, that simple static organization of nodes into clusters would be sufficient to allow latency locality.
Though we didn’t technically read the follow-up paper, there is an interesting section on lessons learned. In the section on usage, the authors found that a small number of popular URLs account for a large percentage of the requests but that these requests take up relatively little room on the cache. This means that more than 70% of requests are served from local cache, with only 7.1% retrieved from other Coral proxies.