DOI

10.55730/1300-0632.4044

Abstract

Classification model with imbalanced datasets is recently one of the most researched areas in machine learning applications since they induce to the emergence of low-performing machine learning models. The imbalanced datasets occur if target variables have an uneven number of examples in a dataset. The most prevalent solutions to imbalanced datasets can be categorized as data preprocessing, ensemble techniques, and cost-sensitive learning. In this article, we propose a new hybrid approach for binary classification, named FuzzyCSampling, which aims to increase model performance by ensembling fuzzy c-means clustering and data sampling solutions. This article compares the proposed approaches' results not only to the base model built on an imbalanced dataset but also to the previously presented state-of-the-art solutions undersampling, SMOTE oversampling, and Borderline Smote Oversampling. The model evaluation metrics for the comparison are accuracy, roc_auc score, precision, recall and F1-score. We evaluated the success of the brand-new proposed method on three different datasets having different imbalanced ratios and for three different machine learning algorithms (k-nearest neighbors algorithm, support vector machines and random forest). According to the experiments, FuzzyCSampling is an effective way to improve the model performance in the case of imbalanced datasets.

Keywords

Binary classification, imbalanced datasets, machine learning, sampling, fuzzy c-means

First Page

1223

Last Page

1236

Recommended Citation

MARAŞ, ABDULLAH and EROL, ÇİĞDEM (2023) "FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 31: No. 7, Article 6. https://doi.org/10.55730/1300-0632.4044
Available at: https://journals.tubitak.gov.tr/elektrik/vol31/iss7/6

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

FuzzyCSampling: A Hybrid fuzzy c-means clustering sampling strategy for imbalanced datasets

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search