Clustering ensemble selection has shown high efficiency in the improvement of the quality of clustering solutions. This technique comprises two important metrics: diversity and quality. It has been empirically proved that ensembles of higher effectiveness can be achieved through taking into consideration the diversity and quality simultaneously. However, the relationships between these two metrics in base clusterings have remained uncertain. This paper suggests a new hierarchical selection algorithm using a diversity/quality measure based on the Jaccard similarity measure. In the proposed algorithm, the selection of the subsets of the clustering partitions is done based on their diversity measures. The proposed diversity measure (in two types of pair-wise diversity and hybrid diversity) is applied to the proposed algorithm. Hypergraph-partitioning algorithm (HGPA), cluster-based similarity partition algorithm (CSPA), and meta-clustering algorithm (MCLA) were used to obtain the consensus solution and cluster ensemble selection results with a hierarchical method. The experimental results on 14 datasets showed that selecting a subset of base clusterings using the proposed algorithm led to more accurate results compared to those of the full ensemble. The effectiveness and robustness of the proposed algorithm were demonstrated in comparison with the full ensemble. The comparative results showed that the proposed method by new diversity measure outperformed the full ensemble.
Cluster ensemble selection, diversity, quality, extended Jaccard measure
KHALILI, HAJAR; RABBANI, MOHSEN; and AKBARI, EBRAHIM
"Clustering ensemble selection based on the extended Jaccard measure,"
Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 29:
4, Article 23.
Available at: https://journals.tubitak.gov.tr/elektrik/vol29/iss4/23