Keddah: Capturing Hadoop Network Behaviour
Jie Deng, Gareth Tyson, FĂ©lix Cuadrado and Steve Uhlig
Queen Mary University London, Queen Mary, Queen Mary University of London, Queen Mary University of London

As a platform processing big data across multiple machines, Hadoop relied heavily on the network to perform. Despite this, the behaviour of network traffic in Hadoop clusters is still poorly understood. This lack of understanding makes it difficult to explore and evaluate network-based Hadoop innovations. In this paper, we explore Hadoop traffic, and present Keddah, a toolchain for capturing, modelling and reproducing Hadoop traffic for use in simulators. This paper provides researchers with an understanding of Hadoop network activity as well as the means to recreate the traffic for profiling network components.