Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.55730/1300-0632.3801
Abstract
The problem of detecting similar documents plays an essential role for many real-world applications, such as copyright protection and plagiarism detection. To protect data privacy, the new version of such a problem becomes more challenging, where the matched documents are distributed among two or more parties and their privacy should be preserved. In this paper, we propose new privacy-preserving document similarity detection schemes by utilizing the locality-sensitive hashing technique, which can handle the misspelled mistakes. Furthermore, the keywords' occurrences of a given document are integrated into its underlying representation to support a better ranking for the returned results. We introduced a new security definition, which hides the exact similarity scores towards the querying party. Extensive experiments on real-world data illustrate that our proposed schemes are efficient and accurate.
Keywords
Document similarity, local sensitive hashing, multiparty computing, privacy preserving
First Page
609
Last Page
628
Recommended Citation
ABDULSADA, AYAD; AL-DARRAJI, SALAH; and HONI, DHAFER
(2022)
"Privacy preserving scheme for document similarity detection,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30:
No.
3, Article 10.
https://doi.org/10.55730/1300-0632.3801
Available at:
https://journals.tubitak.gov.tr/elektrik/vol30/iss3/10
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons