•  
  •  
 

Turkish Journal of Electrical Engineering and Computer Sciences

Abstract

The problem of detecting similar documents plays an essential role for many real-world applications, such as copyright protection and plagiarism detection. To protect data privacy, the new version of such a problem becomes more challenging, where the matched documents are distributed among two or more parties and their privacy should be preserved. In this paper, we propose new privacy-preserving document similarity detection schemes by utilizing the locality-sensitive hashing technique, which can handle the misspelled mistakes. Furthermore, the keywords' occurrences of a given document are integrated into its underlying representation to support a better ranking for the returned results. We introduced a new security definition, which hides the exact similarity scores towards the querying party. Extensive experiments on real-world data illustrate that our proposed schemes are efficient and accurate.

DOI

10.55730/1300-0632.3801

Keywords

Document similarity, local sensitive hashing, multiparty computing, privacy preserving

First Page

609

Last Page

628

Plum Print visual indicator of research metrics
PlumX Metrics
  • Usage
    • Downloads: 210
    • Abstract Views: 108
  • Captures
    • Readers: 2
see details

Share

COinS