Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1901-91
Abstract
Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87 % compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.
Keywords
Sentence similarity, plagiarism detection, text mining, vector space model, paraphrase database
First Page
3779
Last Page
3790
Recommended Citation
JAVADZADEH, REZA; ZAHEDI, MORTEZA; and RAHIMI, MARZIEA
(2019)
"Sentence similarity using weighted path and similarity matrices,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27:
No.
5, Article 36.
https://doi.org/10.3906/elk-1901-91
Available at:
https://journals.tubitak.gov.tr/elektrik/vol27/iss5/36
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons