Turkish Journal of Electrical Engineering and Computer Sciences

Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model

DOI

10.55730/1300-0632.4003

Abstract

Spell checking and correction is an important step in the text normalization process. These tasks are more challenging in agglutinative languages such as Turkish since many words can be derived from the root word by combining many suffixes. In this study, we propose a two-step deep learning-based model for misspelled word detection in the Turkish language. A false positive reduction model is integrated into the system to reduce the false positive predictions originating from the use of foreign words and abbreviations that are commonly used in Internet sharing platforms. For this purpose, we create a multi-class dataset by developing a mobile application for labeling. We compare the effect of using different types of tokenizers including character-based, syllable-based, and byte-pair encoding (BPE) approaches together with Long Short-Term Memory (LSTM) and Bi-directional LSTM (Bi-LSTM) networks. The findings show that the proposed Bi-LSTM-based model with the BPE tokenizer is superior to the benchmarking methods. The results also indicate that the false positive reduction step significantly increased the precision of the base detection model in exchange for a comparably less drop in its recall.

Keywords

Text normalization, spell checker, tokenizers, long short-term memory, agglutinative languages

First Page

581

Last Page

595

Recommended Citation

AYTAN, BURAK and ŞAKAR, CEMAL OKAN (2023) "Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 31: No. 3, Article 7. https://doi.org/10.55730/1300-0632.4003
Available at: https://journals.tubitak.gov.tr/elektrik/vol31/iss3/7

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Deep learning-based Turkish spelling error detection with a multi-class false positive reduction model

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search