No icon

A Virtual-Machine Resiliency Management System for the Cloud

A Virtual-Machine Resiliency Management System for the Cloud


This article presents a scalable parallel virtual machine (VM) resiliency management system for the cloud environment. It describes two complementary mechanisms for providing resiliency: automated VM restart when a physical server fails unpredictably, and automated evacuation of the VMs from a physical server that is about to fail or needs to be maintained. The solution is suitable for clouds with a large number of physical servers, VMs, disks, networking components, and management tools. This solution has been deployed in IBM’s Cloud Managed Services enterprise cloud.

Existing System:

Certain problems arise from the large scale of the system and other constraints to be described below. A potentially large number of virtual machines (VMs) must be either restarted or evacuated, especially in scenarios where multiple physical servers are affected by the event and only a short time window is available to complete the resiliency processes. Every step of the resiliency process must be designed to be either prestaged (as in creation of a precomputed failover plan), performed very quickly (as in, alternatively, very fast failover plan computation), or performed in parallel (as in parallel disk mapping and VM migration).

Proposed System:

Application-level HA techniques are built around application-aware clustering technology. These solutions are used to improve the availability of applications by continuously monitoring the application’s specific resources and their physical-server environment and invoking recovery procedures when failures occur. These solutions typically use multiple VMs that are working together in order to ensure that an application is always available. These VMs are arranged in an active–passive or active–active configuration. When one VM fails, its functionality is taken over by the backup VM in the cluster. Examples of these solutions are IBM PowerHA, Microsoft Clustering Services, Symantec Cluster Server, and Linux-HA.

Comment As:

Comment (2)