Abstract

In multiword expressions (MWEs), multiple words unite to build a new unit in language. When MWE identification is accepted as a binary classification task, one of the most important factors in performance is to train the classifier with enough number of labelled samples. Since manual labelling is a time-consuming task, the performances of MWE recognition studies are limited with the size of the training sets. In this study, we propose the comparison-based and common-decision co-training approaches in order to enlarge the MWE dataset. In the experiments, the performances of the proposed approaches were compared to those of the standard co-training [1] and manual labelling where statistical and linguistic features are employed as two different views of the MWE dataset [2]. A number of tests with different settings were performed on a Turkish MWE dataset. Ten different classifiers were utilized in the experiments and the best performing classifier pair was observed to be the SMO-SMO pair. The experimental results showed that the common-decision co-training approach is an alternative to hand-labeling of large MWE datasets and both newly proposed approaches outperform the standard co-training [2] when the training set is to be enlarged in MWE classification.

DOI

10.3906/elk-1709-185

Keywords

Multiword expression, classification, training set, co-training

First Page

2583

Last Page

2594

Recommended Citation

METİN, S. K (2018). Enlarging multiword expression dataset by co-training. Turkish Journal of Electrical Engineering and Computer Sciences 26 (5): 2583-2594. https://doi.org/10.3906/elk-1709-185

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Enlarging multiword expression dataset by co-training

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Enlarging multiword expression dataset by co-training

Authors

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search