A Fast Heuristic Attribute Reduction Algorithm using Spark
Mincheng Chen, Jingling Yuan, Lin Li, Dongling Liu and Tao Li
Wuhan University of Technology, Wuhan University of Technology, Wuhan University of Technology, Wuhan University of Technology, University of Florida

Energy data, which consists of energy consumption statistics and other related data in green data centers, grows dramatically. The energy data has great value, but many attributes within it are redundant and unnecessary. Thus attribute reduction for the energy data has been conceived as a critical step. However, many existing attribute reduction algorithms are often computationally time-consuming. To address these issues, we extend the methodology of rough sets to construct data center energy consumption knowledge representation system. By taking good advantage of in-memory computing, an attribute reduction algorithm for energy data using Spark is proposed. In this algorithm, we use a heuristic formula for measuring the significance of attribute to reduce search space, and an efficient algorithm for simplifying energy consumption decision table, which further improve the computation efficiency. The experimental results show the speed of our algorithm gains up to 0.28X performance improvement over the traditional attribute reduction algorithm using Spark.