Design a Spark big data Framework for PM2.5 Air Pollution Forecasting

July 21,2021

Design of a Spark Big Data Framework for PM2.5 Air Pollution Forecasting
by Dong-Her Shih, Thi Hien To, Ly Sy Phu Nguyen, Ting-Wei Wu and Wen-Ting You 

Abstract
In recent years, with the rapid economic development, air pollution has become ex-tremely serious, causing many negative effects on health, environment, and medical costs. PM2.5 is one of the main components of air pollution, Therefore, it is necessary to know the PM2.5 air quality in advance. This study proposes a PM2.5 instant prediction architecture based on the Spark big data framework. The Spark big data framework PM2.5 real-time prediction architecture proposed in this study is divided into three modules: data collection and processing module, big data training and prediction module, and data visualization module. It can collect PM2.5 data in real time, and perform ensemble learning model through three machine learning algorithms to predict the PM2.5 concentration value in the next 30 to 180 minutes. The experimental results show that the ensemble prediction model of this research works very well. This study hopes to provide a real-time predicting and monitoring PM2.5 concentration value through the architec-ture of the Spark big data framework, which may reduce the social and economic problems caused by air pollution.
 
Keywords: Air Pollution; PM2.5 predictions; Machine Learning; Spark, Ensemble model; Big data

Article link: https://www.mdpi.com/1174760
 

臺越環境保護海外科研中心

640301 雲林縣斗六市大學路3段123號 / Tel:05-534-2601#2750 

Copyright © 2021 YunTech. 網頁設計 DESIGNGOGO