DOI

10.3906/elk-1707-6

Abstract

Apache Spark is one of the most technically challenged frameworks for cluster computing in which data are processed in a parallel fashion. The cluster consists of unreliable machines. It processes a large amount of data faster compared to the MapReduce framework. For providing the facility of optimized and fast SQL query processing, a new unit is developed in Apache Spark named Spark SQL. It allows users to use relational processing and functional programming in one place. It provides many optimizations by leveraging the benefits of its core. This is called the catalyst optimizer. This optimizer has many rules to optimize queries for efficient execution. In this paper, we discuss a scenario in which the catalyst optimizer is not able to optimize the query competently for a specific case. This is the reason for inefficient memory usage and increases in the time required for the execution of the query by Spark SQL. For dealing with this issue, we propose a solution in this paper by which the query is optimized up to the peak level. This significantly reduces the time and memory consumed by the shuffling process.

Keywords

Shuffling, pushdown filter, rules, joins, catalyst optimizer

First Page

2489

Last Page

2499

Recommended Citation

CHAWLA, MEENU and BANIWAL, VINITA (2018) "Optimization in the catalyst optimizer of Spark SQL," Turkish Journal of Electrical Engineering and Computer Sciences: Vol. 26: No. 5, Article 27. https://doi.org/10.3906/elk-1707-6
Available at: https://journals.tubitak.gov.tr/elektrik/vol26/iss5/27

Download

Included in

Computer Engineering Commons, Computer Sciences Commons, Electrical and Computer Engineering Commons

COinS

Turkish Journal of Electrical Engineering and Computer Sciences

Optimization in the catalyst optimizer of Spark SQL

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Issues by Year

Search

Turkish Journal of Electrical Engineering and Computer Sciences

Optimization in the catalyst optimizer of Spark SQL

Authors

DOI

Abstract

Keywords

First Page

Last Page

Recommended Citation

Included in

Share

Issues by Year

Search