No icon

Optimizing for Tail Sojourn Times of Cloud Clusters

Optimizing for Tail Sojourn Times of Cloud Clusters


A common pitfall when hosting applications on today’s cloud environments is that virtual servers often experience varying execution speeds due to the interference from co-located virtual servers degrading the tail sojourn times specified in service level agreements. Motivated by the significance of tail sojourn times for cloud clusters, we develop a model of N parallel virtual server queues, each of which processes jobs in a processor sharing fashion under varying execution speeds governed by Markov-modulated processes. We derive the tail distribution of the workloads for each server and the approximation for the tail sojourn times based on large deviation analysis. Furthermore, we optimize the cluster sizes that fulfill the requirements of target tail sojourn times. Extensive simulation experiments show very good matches to the derived analysis in a variety of scenarios, i.e., large numbers of servers experiencing a high number of different execution speeds, under various traffic intensities, workload variations and cluster sizes. Finally, we apply our proposed analysis on estimating the tail sojourn times of a Wikipedia system hosted in a private cloud, and the testbed results strongly confirm the applicability and accuracy of our analysis.

Existing System:

The degradation of tail sojourn times in the cloud is further exacerbated when deploying clusterbased applications, i.e., relying on a large number of VMs. Web  and big data services are typical examples requiring such cluster deployments. In addition to the modulated execution speed and cluster size, the distribution of sojourn times, particularly the tail, is also affected by the load balancing algorithm distributing the load across VMs and the processor scheduling mechanism at each VM. Typically, a simple round  robin algorithm is widely adopted, such as the one used in the Amazon EC2 cloud. Requests are executed in a Processor Sharing (PS) fashion on individual VMs, which are typically hosted on separate physical servers. Overall, when deploying application clusters on today’s cloud, three aspects are crucial for capturing the distribution of workloads and sojourn times: the modulated execution speeds of VMs, the load balancing algorithm, and the processor scheduling. Our research here aims to address the challenging question of how to best dimension the size of a cluster deployed in a cloud, in terms of number of VMs experiencing varying execution speeds, so that the target value of tail sojourn times can be met. We particularly take an analytical perspective and focus on deriving the  tail response times for any given cloud cluster size using various key system parameters.

Proposed System:

We develop an abstract parallel queueing system, where each queue is a G/G/1/PS with Markov modulated execution speeds, to represent the application cluster hosted in a cloud. We obtain the distribution of workloads accumulated in the system, with special focus on their tail, using large deviation analysis. To derive the approximation of the tail sojourn times, we leverage the workload distribution and a mean-based analysis of the M/G/1/PS queue with an average execution speed. We first derive the conditional distribution of the number of jobs for a given workload distribution for the M/G/1/PS queue. Then we develop an approximation scheme for the

Comment As:

Comment (0)