Ph.D. Thesis Proposal: Mukil Kesavan

Add to Calendar
Date:
November 20, 2012 11:00 am - 1:00 pm
Location:
KACB 2100

Ph.D. Thesis Proposal
Title: Practical Elastic Resource Allocation in Cloud Datacenters

Mukil Kesavan
School of Computer Science
College of Computing
Georgia Institute of Technology

Date: Tuesday, November 20, 2012
Time: 11:00 AM - 1:00 PM EST
Location: Klaus 2100

Committee:

  • Dr. Karsten Schwan, School of Computer Science (Advisor)
  • Dr. Ada Gavrilovska, School of Computer Science  (Advisor)
  • Dr. Douglas M. Blough, School of Electrical & Computer Engineering
  • Dr. Ravi Soundararajan, VMware Inc.

Abstract:
Elastic resource allocation systems are critical to managing application performance and infrastructure utilization in datacenters. Architecting a realistic solution for multiplexing capacity across an entire datacenter, at the scale of thousands of machines, brings with it a lot of challenges. There are foreseeable issues such as scalability, overhead and accuracy of allocation methods that need to be overcome. In addition, there are unavoidable practical issues at scale, particularly,
(i) the variable cost of carrying out the management actions necessary to enforce allocations and, (ii) failures in those management actions, that can seriously constrain the achievable benefits. Further, the uncertainty in resource allocation caused by these factors results in unpredictable application level performance.

This thesis addresses the problem of realizing a scalable elastic resource allocation system that can work well in real-life environments with the aforementioned practical constraints. We make the following concrete contributions:

* A detailed analysis of the impact of management failures and cost variability on elastic resource allocation efficiency. This includes new metrics with which to evaluate and compare the performance of similar systems.

* The design and construction of a hierarchical datacenter-wide resource allocation system that can dynamically trade-off between allocation accuracy and management overheads while adapting to the constraints imposed by observed failure and variability behavior. Our system is currently deployed on a 700 server datacenter virtualized with the VMware vSphere platform.

* The co-design of elastic resource allocation system and application framework to reduce end performance variability.

* A cloud workload suite consisting of a load generation framework and popular cloud codes that can be used to generate realistic datacenter-wide resource usage scenarios. This enables accurate evaluation of the target system.