Logo do repositório
 
Publicação

A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs

datacite.subject.fosCiências Naturais::Ciências da Computação e da Informação
datacite.subject.sdg08:Trabalho Digno e Crescimento Económico
datacite.subject.sdg09:Indústria, Inovação e Infraestruturas
dc.contributor.authorVicente, André
dc.contributor.authorGrilo, Carlos
dc.contributor.authorRijo, Rui
dc.contributor.authorMartinho, Ricardo
dc.date.accessioned2026-04-28T12:44:30Z
dc.date.available2026-04-28T12:44:30Z
dc.date.issued2026-03-19
dc.date.updated2026-04-23T12:06:27Z
dc.descriptionArticle number: 100762.
dc.description.abstractThis study addresses the critical challenge of identifying and labelling the case ID attribute in unlabelled event logs, a fundamental task in process mining. Case IDs uniquely associate events with individual process instances, enabling accurate analysis and discovery of operational insights. Manual identification of case IDs is error-prone and labour-intensive, often hindering the scalability and reliability of process mining analyses. This paper introduces a novel heuristic method that automates case ID identification, improving efficiency and accuracy for diverse real-world datasets. The proposed heuristic leverages unique temporal patterns observed in event logs to distinguish case ID attributes from other attributes. It calculates a weighted average of temporal spans and applies customisable parameters to prioritise relevant attributes. The method was validated using 27 datasets from the Business Process Intelligence (BPI) Challenge, representing a variety of industries and event log complexities. Performance metrics, including success rates and computational efficiency, were benchmarked against existing approaches. The heuristic achieved an 85.2% top-1 success rate, and remains effective provided at least one repeating categorical attribute is present - a condition met by virtually all publicly available business and industrial logs. It consistently ranked case IDs among the top attributes even in challenging scenarios, such as cyclic processes and multi-correlated data. The method demonstrated robustness across diverse datasets, processing large event logs within seconds, highlighting its practicality for real-world applications. This research contributes an innovative and explainable approach to case ID identification that requires only raw event logs, contrasting with existing methods reliant on pre-labelled data or complex pipelines. Its simplicity, efficiency, and adaptability to various process types make it a valuable tool for advancing process mining capabilities.eng
dc.description.versionN/A
dc.identifier.citationVicente, A., Grilo, C., Rijo, R., & Martinho, R. (2026). A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs. Array, 30, 100762. https://doi.org/10.1016/j.array.2026.100762
dc.identifier.doi10.1016/j.array.2026.100762en_US
dc.identifier.issn2590-0056
dc.identifier.slugcv-prod-5021475
dc.identifier.urihttp://hdl.handle.net/10400.8/16211
dc.language.isoeng
dc.peerreviewedyes
dc.publisherElsevier
dc.relation.hasversionhttps://www.sciencedirect.com/science/article/pii/S2590005626000858
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectProcess mining
dc.subjectUnlabelled event logs
dc.subjectCase ID identification
dc.subjectHeuristics
dc.titleA simple heuristic for the identification of the case ID attribute in unlabelled process mining event logseng
dc.typeresearch articleen_US
dspace.entity.typePublication
oaire.citation.endPage12
oaire.citation.startPage1
oaire.citation.titleArrayen_US
oaire.citation.volume30
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.familyNameRijo
person.familyNameMartinho
person.givenNameRui Pedro Charters Lopes
person.givenNameRicardo
person.identifier.ciencia-idE71D-3237-849C
person.identifier.ciencia-idF51E-9BB5-EF92
person.identifier.orcid0000-0002-9348-0474
person.identifier.orcid0000-0003-1157-7510
person.identifier.ridK-8277-2013
person.identifier.scopus-author-id36861366200
person.identifier.scopus-author-id25823103700
rcaap.cv.cienciaidF51E-9BB5-EF92 | Ricardo Martinho
rcaap.rightsopenAccessen_US
relation.isAuthorOfPublicatione69d7599-392c-4f8f-a96a-bf0a0d15c8b1
relation.isAuthorOfPublicationb2a74e46-f06c-4dcd-8c64-8f78f1d55440
relation.isAuthorOfPublication.latestForDiscoverye69d7599-392c-4f8f-a96a-bf0a0d15c8b1

Ficheiros

Principais
A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
A simple heuristic for the identification of the case ID attribute_.pdf
Tamanho:
2.33 MB
Formato:
Adobe Portable Document Format
Licença
A mostrar 1 - 1 de 1
Miniatura indisponível
Nome:
license.txt
Tamanho:
1.33 KB
Formato:
Item-specific license agreed upon to submission
Descrição: