Improvement of air pollution prediction in a smart city and its correlation with weather conditions using metrological big data


Abstract: Smart cities are an important concept for urban development. This concept addresses many current critical urban problems including traffic and environmental pollution. As utilization of the Internet of things and technology in smart cities increases, large volumes of big data are generated and collected by sensors embedded at different places in the city, which present a real-time display of what is happening throughout the city at all times. Such data should be processed and analyzed as a response to ensure effectiveness and improvement in quality of provided services; correct use and analysis of such data is valuable. Big data mining is the most effective method for analyzing such data. In this paper, aiming to increase speed and accuracy in predicting real levels of air pollution, its location, and effects of weather conditions on density of air pollution, a K-means clustering algorithm using the Mahout library is used as a big data mining tool on datasets of a city pulse project. Results of this study show that temperature, low air pressure, relative increase in moisture, and wind speed are causes of low pollution density at the cleanest point of the city. The SSE evaluation metric shows the high speed of this clustering method for big data, and results obtained from employing RMSE = 0.632 and MSE = 0.488 statistical measures indicate the high efficiency and accuracy of the proposed method in predicting air pollution.

Keywords: Smart city, Internet of things, big data mining, K-means clustering, Mahout, air pollution

