DOI

10.3906/elk-1302-68

Abstract

Although the support vector machine (SVM) algorithm has a high generalization property for classifying unseen examples after the training phase~and a small loss value, the algorithm is not suitable for real-life classification and regression problems. SVMs cannot solve hundreds of thousands of examples in a training dataset. In previous studies on distributed machine-learning algorithms, the SVM was trained in a costly and preconfigured computer environment. In this research, we present a MapReduce-based distributed parallel SVM training algorithm for binary classification problems. This work shows how to distribute optimization problems over cloud computing systems with the MapReduce technique. In the second step of this work, we used statistical learning theory to find the predictive hypothesis that would minimize the empirical risks from hypothesis spaces that were created with the Reduce function of MapReduce. The results of this research are important for the training of big datasets for SVM algorithm-based classification problems. We provided the iterative training of the split dataset with the MapReduce technique; the accuracy of the classifier function will converge to global optimal classifier function accuracy in finite iteration size. The algorithm performance was measured on samples from letter recognition and pen-based recognition of a handwritten digits dataset.

Keywords

Support vector machine, machine learning, cloud computing, MapReduce, large-scale dataset

First Page

863

Last Page

873

Recommended Citation

ÇATAK, FERHAT ÖZGÜR and BALABAN, MEHMET ERDAL (2016) "A MapReduce-based distributed SVM algorithm for binary classification," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 24: No. 3, Article 12. https://doi.org/10.3906/elk-1302-68
Available at: https://journals.tubitak.gov.tr/elektrik/vol24/iss3/12

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

A MapReduce-based distributed SVM algorithm for binary classification

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

A MapReduce-based distributed SVM algorithm for binary classification

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search