Turkish Journal of Electrical Engineering and Computer Sciences
DOI
10.55730/1300-0632.3794
Abstract
Centroid based clustering approaches, such as k-means, are relatively fast but inaccurate for arbitrary shape clusters. Fuzzy c-means with Mahalanobis distance can accurately identify clusters if data set can be modelled by a mixture of Gaussian distributions. However, they require number of clusters apriori and a bad initialization can cause poor results. Density based clustering methods, such as DBSCAN, overcome these disadvantages. However, they may perform poorly when the dataset is imbalanced. This paper proposes a clustering method, named clustering with density initialization and Bhattacharyya based merging based on the fuzzy clustering. The initialization is carried out by density estimation with adaptive bandwidth using k-Nearest Orthant-Neighbor algorithm to avoid the effects of imbalanced clusters. The local peaks of the point clouds constructed by the k-Nearest Orthant-Neighbor algorithm are used as initial cluster centers for the fuzzy clustering. We use Bhattacharyya measure and Jensen inequality to find overlapped Gaussians and merge them to form a single cluster. We carried out experiments on a variety of datasets and show that the proposed algorithm has remarkable advantages especially for imbalanced and arbitrarily shaped data sets.
Keywords
Infinite mixture models, density estimation, Jensen inequality, bandwidth selection, optimal number of clusters, arbitrarily shaped clusters
First Page
502
Last Page
517
Recommended Citation
KÖSE, ERDEM and HOCAOĞLU, ALİ KÖKSAL
(2022)
"Clustering with density based initialization and Bhattacharyya based merging,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30:
No.
3, Article 3.
https://doi.org/10.55730/1300-0632.3794
Available at:
https://journals.tubitak.gov.tr/elektrik/vol30/iss3/3
Included in
Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons