DOI

10.55730/1300-0632.3957

Abstract

The present study aims to generate low-dimensional explicit distributional semantic vectors. In explicit semantic vectors, each dimension corresponds to a word, which makes word vectors interpretable. In this study, a new approach is proposed to obtain low-dimensional explicit semantic vectors. Firstly, the suggested approach considers three criteria, namely, word similarity, number of zeros, and word frequency as features for words in a corpus. Next, some rules are extracted to obtain the initial basis words using a decision tree which is drawn based on the three features. Secondly, a binary weighting method is proposed based on the binary particle swarm optimization algorithm which obtains NB = 1000 context words. In addition, a word selection method is used to provide NS = 1000 context words. Thirdly, the golden words of the corpus are extracted based on the binary weighting method. Subsequently, the extracted golden words are added to the context words which are selected by the word selection method as the golden context words. The ukWaC corpus is utilized for constructing the word vectors. MEN, RG-65, and SimLex-999 test sets are used to evaluate the word vectors. Next, the results are compared to a baseline which uses 5K most frequent words in the corpus as the context words. The baseline method uses a fixed window to count the cooccurrences. The word vectors are obtained using the 1000 selected context words along with the golden context words. Compared to the baseline method, the suggested approach can increase Spearman?s correlation coefficient for the MEN, RG-65, and SimLex-999 test sets by 4.66%, 14.73%, and 1.08%, respectively.

Keywords

Explicit word vectors, rule-based selection method, golden context words, final basis words

First Page

2586

Last Page

2604

Recommended Citation

PAKZAD, ATEFE and ANALOUI, MORTEZA (2022) "A rule-based/BPSO approach to produce low-dimensional semantic basis vectors set," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30: No. 7, Article 8. https://doi.org/10.55730/1300-0632.3957
Available at: https://journals.tubitak.gov.tr/elektrik/vol30/iss7/8

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

A rule-based/BPSO approach to produce low-dimensional semantic basis vectors set

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

A rule-based/BPSO approach to produce low-dimensional semantic basis vectors set

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search