No icon

JouleMR: Towards Cost-Effective and Green-Aware Data Processing Frameworks

JouleMR: Towards Cost-Effective and Green-Aware Data Processing Frameworks

Abstract:

Interests have been growing in energy management of the cluster effectively in order to reduce the energy consumption as well as the electricity cost. Renewable energy and dynamic pricing schemes in smart grids are two major emerging trends in energy markets. However, current data processing frameworks are not aware of the efficiency of each joule consumed by the data center workloads in the context of these two major trends. In fact, not all joules are equal in the sense that the amount of work that can be done by a joule can vary significantly in data centers. Ignoring this fact leads to significant energy waste (by 25% of the total energy consumption in Hadoop YARN on a Facebook production trace according to our study). In this paper, we propose JouleMR, a cost-effective and green-aware data processing framework. Specifically, we investigate how to exploit such joule efficiency to maximize the benefits of renewable energy as well as dynamic pricing schemes for MapReduce framework. We develop job/task scheduling algorithms with a particular focus on the factors on joule efficiency in the data center, including the energy efficiency of MapReduce workloads, renewable energy supply, dynamic pricing and the battery usage. We further develop a simple yet effective performance-energy consumption model to guide our scheduling decisions. We have implemented JouleMR on top of Hadoop YARN. The experiments demonstrate the accuracy of our models, and the effectiveness of our cost-effective and green-aware optimizations outperform the state-of-the-art implementations over Hadoop YARN.

Existing System:

Considering leveraging renewable energy and dynamic pricing schemes in data processing frameworks, we find that the key challenge is that they are both time-varying. Take renewable energy as an example. The renewable energy sources are intermittent due to daily/seasonal effects and the prices of the electricity markets are varying on an hourly basis. Thus, the renewable energy supply or the low-price electricity may not match the workload demands, which results in severe under-utilization of renewable energy in a non-green-aware system or significant financial burden in a noncost- effective system. Green-aware systems  and cost-effective systems have been developed to address the mismatch in the context of data centers. The core ideas behind those studies are similar: they delay workloads according to jobs’ deadline to match the renewable supply or the low-price electricity.

Proposed System:

A cost-effective and greenaware MapReduce framework.With the slack time on each job, we develop job/task scheduling algorithms with special considerations on the factors on joule efficiency in the data center, including the energy efficiency of MapReduce workloads, renewable energy supply, dynamic pricing and the battery usage. Since finding the optimal scheduling solution is a NP-hard problem, JouleMR embraces a series of simple and effective heuristics for optimizations. We further develop a performance-energy consumption model. The model guides us to make the scheduling decision so that the deadline can be met and joule efficiency can be accurately estimated for scheduling the job/task at appropriate time.

Comment As:

Comment (0)