Turkish Journal of Electrical Engineering and Computer Sciences

Design development and performance analysis of distributed least square twinsupport vector machine for binary classification

Abstract

Machine learning (ML) on Big Data has gone beyond the capacity of traditional machines and technologies. ML for large scale datasets is the current focus of researchers. Most of the ML algorithms primarily suffer from memory constraints, complex computation, and scalability issues.The least square twin support vector machine (LSTSVM) technique is an extended version of support vector machine (SVM). It is much faster as compared to SVM and is widely used for classification tasks. However, when applied to large scale datasets having millions or billions of samples and/or large number of classes, it causes computational and storage bottlenecks. This paper proposes a novel scalable design for LSTSVM named distributed LSTSVM (DLSTSVM). This design exploits distributed computation on cluster of machines to provide a scalable solution to LSTSVM. Very large datasets are partitioned and distributed in the form of resilient distributed datasets on top of Spark cluster computing engine. LSTSVM is trained to generate two nonparallel hyper-planes. These hyper-planes are achieved by solving two systems of linear equations each of which involves data instances from either class. While designing DLSTSVM we employed distributed matrix operations using the MapReduce paradigm of computing to distribute the tasks over multiple machines in the cluster. Thus, memory constraints with extremely large datasets are averted. Experimental results show the reduction in time complexity as compared to existing scalable solutions to SVM and its variants. Moreover, detailed experiments depict the scalability of the proposed design with respect to large datasets.

DOI

10.3906/elk-2008-155

Keywords

Distributed machine learning, Big Data, cluster computing, least square twin support vector machine (LSTSVM), MapReduce, parallel processing

First Page

2934

Last Page

2949

Recommended Citation

PRASAD, B. R, & AGARWAL, S (2021). Design development and performance analysis of distributed least square twinsupport vector machine for binary classification. Turkish Journal of Electrical Engineering and Computer Sciences 29 (7): 2934-2949. https://doi.org/10.3906/elk-2008-155

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Design development and performance analysis of distributed least square twinsupport vector machine for binary classification

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Design development and performance analysis of distributed least square twinsupport vector machine for binary classification

Authors

Abstract

DOI

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search