Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.3906/elk-1709-185
Abstract
In multiword expressions (MWEs), multiple words unite to build a new unit in language. When MWE identification is accepted as a binary classification task, one of the most important factors in performance is to train the classifier with enough number of labelled samples. Since manual labelling is a time-consuming task, the performances of MWE recognition studies are limited with the size of the training sets. In this study, we propose the comparison-based and common-decision co-training approaches in order to enlarge the MWE dataset. In the experiments, the performances of the proposed approaches were compared to those of the standard co-training [1] and manual labelling where statistical and linguistic features are employed as two different views of the MWE dataset [2]. A number of tests with different settings were performed on a Turkish MWE dataset. Ten different classifiers were utilized in the experiments and the best performing classifier pair was observed to be the SMO-SMO pair. The experimental results showed that the common-decision co-training approach is an alternative to hand-labeling of large MWE datasets and both newly proposed approaches outperform the standard co-training [2] when the training set is to be enlarged in MWE classification.
Keywords
Multiword expression, classification, training set, co-training
First Page
2583
Last Page
2594
Recommended Citation
METİN, SENEM KUMOVA
(2018)
"Enlarging multiword expression dataset by co-training,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 26:
No.
5, Article 34.
https://doi.org/10.3906/elk-1709-185
Available at:
https://journals.tubitak.gov.tr/elektrik/vol26/iss5/34
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons