Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1708-336
Abstract
Data deduplication is a capacity optimization technology used in backup systems for identifying and storing the nonredundant data blocks. The CPU intensive tasks involved in a hash-based deduplication system remain as challenges in improving the performance of the system. In this paper, we propose a parallel variant of the standard cuckoo hashing that enables the hashing technique to be performed in parallel. The CPU intensive tasks of fingerprint insertion and lookup operations are performed in parallel and distributed among the nodes of the deduplication cluster. Furthermore, the uniform handling of the blocks by the cluster nodes involved in the process of duplicate identification provides good load balance. Experimental evaluations using real-world backup and Linux kernel data sets reveal that the proposed deduplication system achieves up to 100{\%} higher backup speed, up to 28{\%} reduced lookup latency, and up to 24{\%} reduced backup time than the other deduplication systems.
Keywords
Deduplication, parallelized cuckoo, backup
First Page
1417
Last Page
1429
Recommended Citation
JEYARAJ, JANE RUBEL ANGELINA; KAMBARAJ, SUNDARAKANTHAM; and DHARMARAJAN, VELMURUGAN
(2018)
"High-speed data deduplication using parallelized cuckoo hashing,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 26:
No.
3, Article 24.
https://doi.org/10.3906/elk-1708-336
Available at:
https://journals.tubitak.gov.tr/elektrik/vol26/iss3/24
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons