•  
  •  
 

Turkish Journal of Electrical Engineering and Computer Sciences

DOI

10.3906/elk-1801-157

Abstract

A support vector machine (SVM) is not a popular method for a very large dataset classification because the training and testing time for such data are computationally expensive. Many researchers try to reduce the training time of SVMs by applying sample reduction methods. Many methods reduced the training samples by using a clustering technique. To reduce its high computational complexity, several data reduction methods were proposed in previous studies. However, such methods are not effective to extract informative patterns. This paper demonstrates a new supervised classification method, multiseed-based SVM (MSB-SVM), which is particularly intended to deal with very large datasets for multiclass classification. The main contributions of the paper are (i) an efficient multiseed technique for selection of seed points from circular/elongated class training samples, (ii) adjacent class pair selection from the set of multiseeds by using the minimum spanning tree, and (iii) extraction of support vectors from class pair seed equivalent regions to manage multiclass classification problems without being computationally expensive. Experimental results on a variety of datasets showed better performance compared to other sample-reducing methods in terms of training and testing time. Traditional support vector machine (SVM) solution suffers from $O(n^{2})$ time complexity, which makes it impractical for very large datasets. Here, multiseed point technique depends on the estimated density of each data, and the order of computation is $O(n$ log $n)$. Using the estimated density, the computational cost of the seed selection algorithm is $O(n)$. So, this is the only burden for reducing the sample. However, reducing the sample takes less time with the proposed algorithm compared to the clustering methods. At the same time, the number of support vectors has been abruptly reduced, which takes less time to find the decision surface. Apart from this, the classification accuracy of the proposed technique is significantly better than other existing sample reduction methods especially for large datasets.

First Page

595

Last Page

604

Share

COinS