•  
  •  
 

Turkish Journal of Electrical Engineering and Computer Sciences

DOI

10.55730/1300-0632.3801

Abstract

The problem of detecting similar documents plays an essential role for many real-world applications, such as copyright protection and plagiarism detection. To protect data privacy, the new version of such a problem becomes more challenging, where the matched documents are distributed among two or more parties and their privacy should be preserved. In this paper, we propose new privacy-preserving document similarity detection schemes by utilizing the locality-sensitive hashing technique, which can handle the misspelled mistakes. Furthermore, the keywords' occurrences of a given document are integrated into its underlying representation to support a better ranking for the returned results. We introduced a new security definition, which hides the exact similarity scores towards the querying party. Extensive experiments on real-world data illustrate that our proposed schemes are efficient and accurate.

Keywords

Document similarity, local sensitive hashing, multiparty computing, privacy preserving

First Page

609

Last Page

628

Share

COinS