Name: | Description: | Size: | Format: | |
---|---|---|---|---|
3.15 MB | Adobe PDF |
Advisor(s)
Abstract(s)
The absence of essential security protocols in Industrial Internet of Things (IIoT) networks introduces cybersecurity vulnerabilities and turns them into potential targets for various attack types.
Although machine learning has been used for intrusion detection in the IIoT, datasets with representative data
of common attacks of IIoT network traffic are limited and often imbalanced. Data augmentation techniques
address these problems by creating artificial data in classes with fewer samples. In this work, we evaluate the
use of data augmentation when training intrusion detection models based on IIoT traffic data. We compare
Generative Pre-trained Transformers (GPT) and variations on the Synthetic Minority Over-sampling
TEchnique (SMOTE) and evaluate their capability to enhance intrusion detection performance. We examine
the performance of five intrusion detection algorithms when trained with augmented datasets to models
trained with the original non-augmented dataset. To ensure a fair comparison, we evaluated the algorithms’
performance in the different scenarios using the same test dataset, which does not contain synthetic data.
The results show the need for a systematic evaluation before employing data augmentation, as its impact on
classification performance depends on the algorithm, data, and used technique. While deep neural networks
benefit from data augmentation, the eXtreme Gradient Boosting (XGBoost), which achieved superior
performance in intrusion detection between all evaluated classifiers (with F1-Score over 91%), didn’t have
any performance improvement when trained with augmented data. The evaluation of data generated by
GPT-based methods shows such methods (especially GReaT) generate invalid data for both numerical and
categorical features in a way that leads to performance degradation in multiclass classification.
Description
This work was supported in part by the Fundação para a Ciência e a Tecnologia (FCT), I.P., under Project UIDB/04524/2020; in part by the Scientific Employment Stimulus-Institutional Call under Grant CEECINST/00051/2018; and in part by the Agência Nacional de Inovação (ANI), S.A., under Project POCI-01-0247-FEDER-046083.
Keywords
IIoT Cybersecurity Data augmentation Machine learning
Pedagogical Context
Citation
Melicias, F. S., Ribeiro, T. F. R., Rabadao, C., Santos, L., & Costa, R. L. D. C. (2024). GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT. IEEE Access, 12, 17945–17965. https://doi.org/10.1109/ACCESS.2024.3360879
Publisher
IEEE