A Scalable and Stacked Ensemble Approach to Improve Intrusion Detection in Clouds
Keywords:Cloud Security, Big Data, Machine Learning, Intrusion Detection System, Apache Spark, Stacking
The availability of automated data collection techniques and the growth in the amount of data collected from cloud network traffic and cloud resource activities has transformed into a big data challenge, compelling the engagement of big data tools to handle, manage, and interpret it. A single classification method may fail to execute successfully for the amount of acquired data. Despite being more complex and consuming more computational resources, the research shows that stacking-based ensemble Machine Learning (ML) methodologies perform better in data classification approaches than single classifiers. This research proposes Intrusion Detection Systems (IDS), both based on the ensemble of ML algorithms built on the Stacked Generalization Approach (SGA) and big data technology. The suggested approaches are tested and assessed on NSL-KDD and UNSW-NB15 datasets, utilizing a Gain Ration (GR) based Feature Selection (FS) approach, J48, OneR, Support Vector Machine (SVM), Random Forest (RF), Multi- layer Perceptron (MLP) and Extreme Gradient Boosting (XGBoost) classifiers and Apache Spark, a prominent big data processing platform. The first technique involves storing data on HDFS, while the second involves selecting the most suitable subset of base classifiers for stacking. A thorough performance investigation reveals that our proposed model outperforms other current IDS models either in terms of accuracy or FPR or other performance metrics, in discovering intrusions for the Cloud.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.