Turkish Journal of Electrical Engineering and Computer Sciences
Abstract
The problem of detecting similar documents plays an essential role for many real-world applications, such as copyright protection and plagiarism detection. To protect data privacy, the new version of such a problem becomes more challenging, where the matched documents are distributed among two or more parties and their privacy should be preserved. In this paper, we propose new privacy-preserving document similarity detection schemes by utilizing the locality-sensitive hashing technique, which can handle the misspelled mistakes. Furthermore, the keywords' occurrences of a given document are integrated into its underlying representation to support a better ranking for the returned results. We introduced a new security definition, which hides the exact similarity scores towards the querying party. Extensive experiments on real-world data illustrate that our proposed schemes are efficient and accurate.
DOI
10.55730/1300-0632.3801
Keywords
Document similarity, local sensitive hashing, multiparty computing, privacy preserving
First Page
609
Last Page
628
Recommended Citation
ABDULSADA, AYAD; AL-DARRAJI, SALAH; and HONI, DHAFER
(2022)
"Privacy preserving scheme for document similarity detection,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30:
No.
3, Article 10.
https://doi.org/10.55730/1300-0632.3801
Available at:
https://journals.tubitak.gov.tr/elektrik/vol30/iss3/10
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons