No icon

Towards Max-Min Fair Resource Allocation for Stream Big Data Analytics in Shared Clouds

Towards Max-Min Fair Resource Allocation for Stream Big Data Analytics in Shared Clouds

Abstract:

Distributed stream big data analytics platforms have emerged to tackle the continuously generated data streams. In stream big data analytics, the data processing workflow is abstracted as a directed graph referred to as a topology. Data are read from the storage and processed tuple by tuple, and these processing results are updated dynamically. The performance of a topology is evaluated by its throughput. This paper proposes an efficient resource allocation scheme for a heterogeneous shared stream big data analytics cluster shared by multiple topologies, in order to achieve max-min fairness in the utilities of the throughput for all the topologies. We first formulate a novel model resource allocation problem, which is a mixed 0-1 integer program. The NP-hardness of the problem is rigorously proven. To tackle this problem, we transform the non-convex constraint to several linear constraints using linearization and reformulation techniques. Based on the analysis of the problem-specific structure and characteristics, we propose an approach that iteratively solves the continuous problem with a fixed set of discrete variables optimally, and updates the discrete variables heuristically. Simulations show that our proposed resource allocation scheme remarkably improves the max-min fairness in utilities of the topology throughput, and is low in computational complexity.

Existing System:

To make the problem even more challenging, different topologies running different analytics applications usually require different amounts of resources at the same throughput level. Meanwhile, since tasks are the basic entities for resource allocation, even inside a topology with a certain throughput level, the resource demands of different tasks can be different. Therefore, if tasks with high resource demands to achieve the same throughput are colocated on the same node with low resource capacity, the above bottleneck problem will be even more pronounced.

 

Proposed system:

We model the resource provisioning in stream processing platforms where both the topology heterogeneity and the node heterogeneity are captured. Performances of topologies are modeled by utility functions of the throughput. The max-min fair resource allocation problem is then novelly formulated.

We conduct simulations to verify the efficacy of our proposed resource allocation scheme. Simulation results show that our proposed scheme remarkably improves the max-min fairness in utilities of the topology throughputs. Meanwhile, the computational complexity of our scheme is shown to be relatively low in practice.

Comment As:

Comment (0)