As the data-driven paradigm for intelligent systems design is gaining prominence, performance requirements have become very stringent, leading to numerous fine-tuned versions of Hadoop and its MapReduce programming model. However, very few researchers have investigated the effect of intelligent reducer placement on Hadoop's performance. This paper delves into this much ignored reducer placement phase for improving Hadoop's performance and proposes to spawn reduce phase of Hadoop tasks in an asynchronous fashion across nodes in a Hadoop cluster. The main contributions of this paper are: (i) to track when map phase of tasks are completed, (ii) to count the number of maps completed, and finally (iii) assign reducers to Hadoop nodes based on map counts such that run-time data copying is minimized. To this end, this paper presents a novel counter based reducer placement (CBRP) algorithm based on the counter values maintained by JobTracker at the rack and node levels. Experiments conducted demonstrate the merit of the proposed reducer placement with average improvements ranging between 5% and 17% experienced across different benchmarks with both late shuffle and early shuffle.
HUSSAIN, MIR WAJAHAT; REDDY, K HEMANT; and ROY, DIPTENDU SINHA
"A counter based approach for reducer placement with augmented Hadoop rackawareness,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 29:
1, Article 28.
Available at: https://journals.tubitak.gov.tr/elektrik/vol29/iss1/28