Turkish Journal of Electrical Engineering and Computer Sciences
The problem of detecting similar documents plays an essential role for many real-world applications, such as copyright protection and plagiarism detection. To protect data privacy, the new version of such a problem becomes more challenging, where the matched documents are distributed among two or more parties and their privacy should be preserved. In this paper, we propose new privacy-preserving document similarity detection schemes by utilizing the locality-sensitive hashing technique, which can handle the misspelled mistakes. Furthermore, the keywords' occurrences of a given document are integrated into its underlying representation to support a better ranking for the returned results. We introduced a new security definition, which hides the exact similarity scores towards the querying party. Extensive experiments on real-world data illustrate that our proposed schemes are efficient and accurate.
Document similarity, local sensitive hashing, multiparty computing, privacy preserving
ABDULSADA, AYAD; AL-DARRAJI, SALAH; and HONI, DHAFER
"Privacy preserving scheme for document similarity detection,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30:
3, Article 10.
Available at: https://journals.tubitak.gov.tr/elektrik/vol30/iss3/10
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons