DOI

10.3906/elk-1901-91

Abstract

Sentence similarity is the task of assessing how similar the two snippets of text are. Similarity techniques are used extensively in clustering, summarization, classification, plagiarism detection etc. Due to a small set of vocabularies, sentence similarity is considered to be a difficult problem in natural language processing. There are two issues in solving this problem: (1) Which similarity techniques to be used for word pair similarity and (2) How to generalize that to sentence pairs. We have used the weighted path, a WordNet-based similarity assessment, and the paraphrase database to obtain word pair similarity values. Thereafter, we extracted maximum values from the pairwise similarity matrix and computed a similarity value for a sentence pair. We have also incorporated a vector space model technique to form a robust similarity measure. Our method outperformed state-of-the-art methods on the STSS65 test dataset in Pearson's correlation of 87 % compared to human similarity scores. Moreover, our approach performed on par with other methods on the STSS131 test data using the same test. Our approach outperforms all the other WordNet-based methods compared on both datasets.

Keywords

Sentence similarity, plagiarism detection, text mining, vector space model, paraphrase database

First Page

3779

Last Page

3790

Recommended Citation

JAVADZADEH, REZA; ZAHEDI, MORTEZA; and RAHIMI, MARZIEA (2019) "Sentence similarity using weighted path and similarity matrices," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 27: No. 5, Article 36. https://doi.org/10.3906/elk-1901-91
Available at: https://journals.tubitak.gov.tr/elektrik/vol27/iss5/36

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Sentence similarity using weighted path and similarity matrices

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Sentence similarity using weighted path and similarity matrices

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search