Trang chủThành tựuThăm dò dữ liệu khuếch tán ô nhiễm không khí Đài-Việt
Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting
July 21,2021
Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting
by Dong-Her Shih, Thi Hien To, Ly Sy Phu Nguyen, Ting-Wei Wu and Wen-Ting You
Abstract
In recent years, with the rapid economic development, air pollution has become ex-tremely serious, causing many negative effects on health, environment, and medical costs. PM2.5 is one of the main components of air pollution, Therefore, it is necessary to know the PM2.5 air quality in advance. This study proposes a PM2.5 instant prediction architecture based on the Spark big data framework. The Spark big data framework PM2.5 real-time prediction architecture proposed in this study is divided into three modules: data collection and processing module, big data training and prediction module, and data visualization module. It can collect PM2.5 data in real time, and perform ensemble learning model through three machine learning algorithms to predict the PM2.5 concentration value in the next 30 to 180 minutes. The experimental results show that the ensemble prediction model of this research works very well. This study hopes to provide a real-time predicting and monitoring PM2.5 concentration value through the architec-ture of the Spark big data framework, which may reduce the social and economic problems caused by air pollution.
Keywords: Air Pollution; PM2.5 predictions; Machine Learning; Spark, Ensemble model; Big data