A Coding Based Optimization for Hadoop
Authors: Zakia Asad (University of Toronto), Mohammad Asad Rehman Chaudhry (soptimizer), David Malone (The Hamilton Institute)
Abstract: The rise of cloud and distributed data-intensive ("Big Data") applications puts pressure on data center networks due to the movement of massive volumes of data. Reducing volume of communication is pivotal for embracing greener data exchange by efficiently utilizing network resources. This work proposes the use of coding techniques working in tandem with software-defined network control as a means of dynamically-controlled reduction in volume of communication. We introduce motivating real-world use-cases, and present a novel spate coding algorithm for the data center networks. Moreover, we bridge the gap between theory and practice by performing a proof-of-concept implementation of the proposed system in a real world data center. We use Hadoop as our target framework. The experimental results show advantage of proposed system compared to “vanilla Hadoop implementation”, “in-network combiner”, and “Combine-N-Code” in terms of volume of communication, goodput, and number of bits that can be transmitted per Joule of energy.
Two-page extended abstract: pdf