Repository logo
 
Publication

GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT

dc.contributor.authorMelicias, Francisco S.
dc.contributor.authorRibeiro, Tiago F. R.
dc.contributor.authorRabadão, Carlos
dc.contributor.authorSantos, Leonel
dc.contributor.authorCosta, Rogério Luís de C.
dc.date.accessioned2024-07-26T15:44:32Z
dc.date.available2024-07-26T15:44:32Z
dc.date.issued2024
dc.date.updated2024-07-26T10:13:11Z
dc.descriptionThis work was supported in part by the Fundação para a Ciência e a Tecnologia (FCT), I.P., under Project UIDB/04524/2020; in part by the Scientific Employment Stimulus-Institutional Call under Grant CEECINST/00051/2018; and in part by the Agência Nacional de Inovação (ANI), S.A., under Project POCI-01-0247-FEDER-046083.pt_PT
dc.description.abstractThe absence of essential security protocols in Industrial Internet of Things (IIoT) networks introduces cybersecurity vulnerabilities and turns them into potential targets for various attack types. Although machine learning has been used for intrusion detection in the IIoT, datasets with representative data of common attacks of IIoT network traffic are limited and often imbalanced. Data augmentation techniques address these problems by creating artificial data in classes with fewer samples. In this work, we evaluate the use of data augmentation when training intrusion detection models based on IIoT traffic data. We compare Generative Pre-trained Transformers (GPT) and variations on the Synthetic Minority Over-sampling TEchnique (SMOTE) and evaluate their capability to enhance intrusion detection performance. We examine the performance of five intrusion detection algorithms when trained with augmented datasets to models trained with the original non-augmented dataset. To ensure a fair comparison, we evaluated the algorithms’ performance in the different scenarios using the same test dataset, which does not contain synthetic data. The results show the need for a systematic evaluation before employing data augmentation, as its impact on classification performance depends on the algorithm, data, and used technique. While deep neural networks benefit from data augmentation, the eXtreme Gradient Boosting (XGBoost), which achieved superior performance in intrusion detection between all evaluated classifiers (with F1-Score over 91%), didn’t have any performance improvement when trained with augmented data. The evaluation of data generated by GPT-based methods shows such methods (especially GReaT) generate invalid data for both numerical and categorical features in a way that leads to performance degradation in multiclass classification.pt_PT
dc.description.versioninfo:eu-repo/semantics/publishedVersionpt_PT
dc.identifier.citationMelicias, F. S., Ribeiro, T. F. R., Rabadao, C., Santos, L., & Costa, R. L. D. C. (2024). GPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoT. IEEE Access, 12, 17945–17965. https://doi.org/10.1109/ACCESS.2024.3360879pt_PT
dc.identifier.doihttps://doi.org/10.1109/ACCESS.2024.3360879pt_PT
dc.identifier.eissn2169-3536
dc.identifier.slugcv-prod-3748901
dc.identifier.urihttp://hdl.handle.net/10400.8/9867
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherIEEEpt_PT
dc.relationResearch Center in Informatics and Communications
dc.relationNot Available
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/10418592pt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectIIoTpt_PT
dc.subjectCybersecuritypt_PT
dc.subjectData augmentationpt_PT
dc.subjectMachine learningpt_PT
dc.titleGPT and Interpolation-Based Data Augmentation for Multiclass Intrusion Detection in IIoTpt_PT
dc.typejournal article
dspace.entity.typePublication
oaire.awardTitleResearch Center in Informatics and Communications
oaire.awardTitleNot Available
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F04524%2F2020/PT
oaire.awardURIinfo:eu-repo/grantAgreement/FCT/CEEC INST 2018/CEECINST%2F00051%2F2018%2FCP1566%2FCT0001/PT
oaire.citation.endPage17965pt_PT
oaire.citation.startPage17945pt_PT
oaire.citation.titleIEEE Accesspt_PT
oaire.citation.volume12pt_PT
oaire.fundingStream6817 - DCRRNI ID
oaire.fundingStreamCEEC INST 2018
person.familyNameF. R. Ribeiro
person.familyNameRabadão
person.familyNameSimões Santos
person.familyNamede Carvalho Costa
person.givenNameTiago
person.givenNameCarlos
person.givenNameLeonel Filipe
person.givenNameRogério Luís
person.identifier.ciencia-id8012-CF64-BE02
person.identifier.ciencia-id5218-1B8A-8ACE
person.identifier.ciencia-id2C1C-E900-6A57
person.identifier.ciencia-idC212-374B-3FF1
person.identifier.ciencia-id7717-9573-0C0F
person.identifier.orcid0000-0003-1603-1218
person.identifier.orcid0000-0001-7332-4397
person.identifier.orcid0000-0002-6883-7996
person.identifier.orcid0000-0003-2306-7585
person.identifier.ridM-3235-2013
person.identifier.ridA-7940-2016
person.identifier.scopus-author-id22433497800
person.identifier.scopus-author-id57203544345
person.identifier.scopus-author-id7801604983
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.identifierhttp://doi.org/10.13039/501100001871
project.funder.nameFundação para a Ciência e a Tecnologia
project.funder.nameFundação para a Ciência e a Tecnologia
rcaap.cv.cienciaid7717-9573-0C0F | Rogério Luís de Carvalho Costa
rcaap.rightsopenAccesspt_PT
rcaap.typearticlept_PT
relation.isAuthorOfPublicationbf9f066e-1fdf-4399-bfd4-f07fc7fb3103
relation.isAuthorOfPublicationcd28279a-5fab-4cca-a371-073e24a7971d
relation.isAuthorOfPublication99f438ca-9099-4e7e-91ea-1a5cbab7a1ab
relation.isAuthorOfPublication68de522f-fc54-440b-83c2-7374dc26f0b3
relation.isAuthorOfPublication5654d934-3fa0-4afb-9b3b-f2736104924c
relation.isAuthorOfPublication.latestForDiscoverycd28279a-5fab-4cca-a371-073e24a7971d
relation.isProjectOfPublication67435020-fe0d-4b46-be85-59ee3c6138c7
relation.isProjectOfPublication79a941dc-2813-4f89-850b-4228ff5224cc
relation.isProjectOfPublication.latestForDiscovery67435020-fe0d-4b46-be85-59ee3c6138c7

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
GPT_and_Interpolation-Based_Data_Augmentation_for_Multiclass_Intrusion_Detection_in_IIoT.pdf
Size:
3.15 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.33 KB
Format:
Item-specific license agreed upon to submission
Description: