Centroid based clustering approaches, such as k-means, are relatively fast but inaccurate for arbitrary shape clusters. Fuzzy c-means with Mahalanobis distance can accurately identify clusters if data set can be modelled by a mixture of Gaussian distributions. However, they require number of clusters apriori and a bad initialization can cause poor results. Density based clustering methods, such as DBSCAN, overcome these disadvantages. However, they may perform poorly when the dataset is imbalanced. This paper proposes a clustering method, named clustering with density initialization and Bhattacharyya based merging based on the fuzzy clustering. The initialization is carried out by density estimation with adaptive bandwidth using k-Nearest Orthant-Neighbor algorithm to avoid the effects of imbalanced clusters. The local peaks of the point clouds constructed by the k-Nearest Orthant-Neighbor algorithm are used as initial cluster centers for the fuzzy clustering. We use Bhattacharyya measure and Jensen inequality to find overlapped Gaussians and merge them to form a single cluster. We carried out experiments on a variety of datasets and show that the proposed algorithm has remarkable advantages especially for imbalanced and arbitrarily shaped data sets.
Infinite mixture models, density estimation, Jensen inequality, bandwidth selection, optimal number of clusters, arbitrarily shaped clusters
KÖSE, ERDEM and HOCAOĞLU, ALİ KÖKSAL
"Clustering with density based initialization and Bhattacharyya based merging,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 30:
3, Article 3.
Available at: https://journals.tubitak.gov.tr/elektrik/vol30/iss3/3