S3: Joint Scheduling and Source Selection for Background Traffic in Erasure-Coded Storage
Shijing Li, Tian Lan, Moo-Ryong Ra and Rajesh Panta
George Washington University, George Washington University, AT&T Labs Research, AT&T Labs Research

Erasure-coded storage systems have gained considerable adoption recently since they can provide the same level of reliability with significantly lower storage overhead compared to replicated systems. However, background traffic of such systems – e.g. repair, rebalance, backup and recovery traffic – often has large volume and consumes significant network resources. Independently scheduling such tasks and selecting their sources can easily create interference among data flows, causing severe deadline violation. We show that the well-known heuristic scheduling algorithms fail to consider important constraints, thus resulting in unsatisfactory performance. In this paper, we claim that an optimal scheduling algorithm that aims to maximize the number of background tasks completed before deadlines must simultaneously consider deadline-aware scheduling, network topology, chunk placement, and time-varying resource availability. This optimization is shown to be NP-hard. To resolve this problem, we propose a novel algorithm, called Linear Programming for Selected Tasks (LPST) to maximize the number of successful tasks and improve overall utilization of the datacenter network. It jointly schedules tasks and selects their sources based on a notion of Remaining Time Flexibility, which measures the slackness of the starting time of a task. We evaluated the efficacy of our algorithm using extensive simulations and validate the results with experiments in a real cloud environment. Our results show that, under certain scenarios, LPST can perform 7x~70x better than the heuristics which blindly treat the infrastructure as a collection of homogeneous resources, and 46.6%~65.9% better than the algorithms that take into account the network topology.