Paper:    Distributed Streaming: The Power of Communication. Atish Das Sarma, Danupon Nanongkai. (Manuscript).

Abstract:   Managing distributed data streams has become an important issue in this era of massive data and cloud computing. Several models have been proposed for the study of algorithms dealing with such data. Most applications use some variant of what was defined as the MUD model [Feldman et al SODA'08]: Data is aggregated on a tree from leaf nodes receiving the stream input to the root that finally outputs the solution. The nodes never communicate as they process the stream input, and only transfer data to parent nodes. The simplicity and power of such models make them popular, particularly in programming paradigms such as MapReduce, Dryad, Hadoop and also those in distributed database and sensor network applications. However, some natural problems do not adapt to this model due to the communication restriction.

In this paper, we propose a model with a novel scheme of restricted communication: SIBS (Slow Input Broadcast Streaming). Machines that are processing streams may communicate by broadcasting; however, we allow a machine to broadcast only right after it processes a stream element. Moreover, we assume that at most one machine receives an input element at each time step, and there is no delay in communication. We justify these assumptions by proving that if a problem can be solved efficiently in SIBS, then it can be solved efficiently in a model where inputs arrive arbitrarily and there are communication delays. These assumptions make it easy to design distributed streaming algorithms for SIBS; parallelization and synchronization are handled automatically. Moreover, SIBS is more powerful than MUD as communication is allowed while processing streams. We consider the problem of computing weighted and unweighted maximum matchings on $n$-node graphs whose edges are stored in a distributed database architecture. We present algorithms with constant factor guarantees on SIBS with $O(n)$ edges of broadcast communication. To contrast this, we show that no algorithm on MUD can achieve a constant factor with $O(n)$ edges of communication. We believe that the SIBS model will lead to the development of more efficient distributed streaming infrastructure and algorithms in the future.

Back to All Publications