DOI

10.55730/1300-0632.3912

Abstract

The agglutinative nature of the Turkish language has a complex morphological structure, and there are generally more than one parse for a given word. Before further processing, morphological disambiguation is required to determine the correct morphological analysis of a word. Morphological disambiguation is one of the first and crucial steps in natural language processing since its success determines later analyses. In our proposed morphological disambiguation method, we used a transformer-based sequence-to-sequence neural network architecture. Transformers are commonly used in various NLP tasks, and they produce state-of-the-art results in machine translation. However, to the best of our knowledge, transformer-based encoder-decoders have not been studied in morphological disambiguation. In this study, in addition to character level tokenization, three input subword representations are evaluated, which are unigram, bytepair, and wordpiece tokenization methods. We have achieved the best accuracy with character input representation which is 96.25%. Although the proposed model is developed for Turkish language, it is not language-dependent, so it can be applied to a larger set of languages.

Keywords

Natural language analysis, agglutinative languages, machine learning methods, morphological disambigua tion, morphological analysis, transformer network

First Page

1897

Last Page

1913

Recommended Citation

ÖZER, HİLAL and KORKMAZ, EMİN ERKAN (2022) "Transmorph: a transformer based morphological disambiguator for Turkish," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30: No. 5, Article 15. https://doi.org/10.55730/1300-0632.3912
Available at: https://journals.tubitak.gov.tr/elektrik/vol30/iss5/15

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Transmorph: a transformer based morphological disambiguator for Turkish

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Transmorph: a transformer based morphological disambiguator for Turkish

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search