Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87 % compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.
JAVADZADEH, REZA; ZAHEDI, MORTEZA; and RAHIMI, MARZIEA
"Sentence similarity using weighted path and similarity matrices,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27:
5, Article 36.
Available at: https://journals.tubitak.gov.tr/elektrik/vol27/iss5/36