Paper #: 13 (P2P) Title: A Scalable Content-Addressable Network (1) Problems The authors wanted to make a scalable peer-to-peer file distribution system; but they realized that the hard problem was not distributing the files themselves, but distributing (decentralizing) the indexing information - the relationship between keys and file locations. The paper describes the CAN, content-addressable network, which is like a hash table for large distributed systems. Thus, for Internet-scale file distribution systems (or other systems that need such an indexing scheme), CANs provide a scalable, fault-tolerant, self-organizing, low-latency hash table-like indexing system. DNS and IP routing provides such functionality by the use of a rigid hierarchical naming structure; the described CAN doesn't need such a structure. A big problem addressed and solved in the paper is the latency required in locating a key in the CAN. The authors wanted to get this latency down to close to the latency due to IP routing through the underlying Internet nodes. They did that, using various techniques to optimize the base architecture, thereby getting the CAN latency to less than twice the IP latency. (2) New Ideas and Strengths Identifying the new ideas of CANs can be done by comparing to Plaxton's algorithm, which accomplishes similar goals. However, CANs improve on Plaxton's alg. in that they: are self-configuring; are capable of handling millions of hosts (instead of thousands), many of which can be "flaky" (unstable and unpredictable); and, allow nodes to independently discover their neighbors in a decentralized manner. CANs divide a d-dimensional Cartesian coordinate space in zones, where each node in the system owns a zone. A deterministic hash function is used, that hashes key values into points in the coordinate space. If a key hashes to a zone owned by node A, the (key-value) pair is stored on A. To find the value associated with a key, you hash the key to a Cartesian point P, then find the node that owns the zone in which P resides. If neither the searching node nor its neighbors own that zone, they route the request to the neighbor that is in the same direction as point P is from the searching node. Several improvements to the base design are proposed (and simulated), that lower the latency in routing through the CAN to find the associated value. The strength of the paper is that it carefully analyzes and simulates to determine the impact of many different optimizations on the base CAN design. The simulation/analysis results show that it's possible to lower the latency of a CAN to reasonably close to the IP latency of the underlying nodes, which seems to indicate the CAN is practical to actually use. However, there are security issues such as how to prevent denial-of-service attacks. The paper explains a pretty complex math-based system in relatively easy to understand terms. (3) Weaknesses and Extensions This is hard stuff and it's hard for me to see weaknesses in it. The paper may be written in a way that's a little too biased in favor of its desired conclusions; e.g. it says "overloading zones adds somewhat to system complexity because nodes must additionally track a set of peers." It might be better to state the increase in complexity in concrete terms and avoid biased language that perhaps under emphasizes the additional complexity. The goal of building a distributed file-sharing web application wasn't achieved in this paper, as the authors admit; their paper just uses simulation results to verify the CAN design. It may be that the CAN design won't work nearly as well in the real-world. Also, the problem of preventing denial-of-service attacks, while acknowledged as "particularly hard," may even make the whole idea of CANs unusable until it's solved. (Of course, the work that has gone into this paper on CANs would probably not be a wasted effort, even if the denial-of-service problem took a long time to solve.) -- END --