Publicação
A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs
| datacite.subject.fos | Ciências Naturais::Ciências da Computação e da Informação | |
| datacite.subject.sdg | 08:Trabalho Digno e Crescimento Económico | |
| datacite.subject.sdg | 09:Indústria, Inovação e Infraestruturas | |
| dc.contributor.author | Vicente, André | |
| dc.contributor.author | Grilo, Carlos | |
| dc.contributor.author | Rijo, Rui | |
| dc.contributor.author | Martinho, Ricardo | |
| dc.date.accessioned | 2026-04-28T12:44:30Z | |
| dc.date.available | 2026-04-28T12:44:30Z | |
| dc.date.issued | 2026-03-19 | |
| dc.date.updated | 2026-04-23T12:06:27Z | |
| dc.description | Article number: 100762. | |
| dc.description.abstract | This study addresses the critical challenge of identifying and labelling the case ID attribute in unlabelled event logs, a fundamental task in process mining. Case IDs uniquely associate events with individual process instances, enabling accurate analysis and discovery of operational insights. Manual identification of case IDs is error-prone and labour-intensive, often hindering the scalability and reliability of process mining analyses. This paper introduces a novel heuristic method that automates case ID identification, improving efficiency and accuracy for diverse real-world datasets. The proposed heuristic leverages unique temporal patterns observed in event logs to distinguish case ID attributes from other attributes. It calculates a weighted average of temporal spans and applies customisable parameters to prioritise relevant attributes. The method was validated using 27 datasets from the Business Process Intelligence (BPI) Challenge, representing a variety of industries and event log complexities. Performance metrics, including success rates and computational efficiency, were benchmarked against existing approaches. The heuristic achieved an 85.2% top-1 success rate, and remains effective provided at least one repeating categorical attribute is present - a condition met by virtually all publicly available business and industrial logs. It consistently ranked case IDs among the top attributes even in challenging scenarios, such as cyclic processes and multi-correlated data. The method demonstrated robustness across diverse datasets, processing large event logs within seconds, highlighting its practicality for real-world applications. This research contributes an innovative and explainable approach to case ID identification that requires only raw event logs, contrasting with existing methods reliant on pre-labelled data or complex pipelines. Its simplicity, efficiency, and adaptability to various process types make it a valuable tool for advancing process mining capabilities. | eng |
| dc.description.version | N/A | |
| dc.identifier.citation | Vicente, A., Grilo, C., Rijo, R., & Martinho, R. (2026). A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs. Array, 30, 100762. https://doi.org/10.1016/j.array.2026.100762 | |
| dc.identifier.doi | 10.1016/j.array.2026.100762 | en_US |
| dc.identifier.issn | 2590-0056 | |
| dc.identifier.slug | cv-prod-5021475 | |
| dc.identifier.uri | http://hdl.handle.net/10400.8/16211 | |
| dc.language.iso | eng | |
| dc.peerreviewed | yes | |
| dc.publisher | Elsevier | |
| dc.relation.hasversion | https://www.sciencedirect.com/science/article/pii/S2590005626000858 | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Process mining | |
| dc.subject | Unlabelled event logs | |
| dc.subject | Case ID identification | |
| dc.subject | Heuristics | |
| dc.title | A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs | eng |
| dc.type | research article | en_US |
| dspace.entity.type | Publication | |
| oaire.citation.endPage | 12 | |
| oaire.citation.startPage | 1 | |
| oaire.citation.title | Array | en_US |
| oaire.citation.volume | 30 | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.familyName | Rijo | |
| person.familyName | Martinho | |
| person.givenName | Rui Pedro Charters Lopes | |
| person.givenName | Ricardo | |
| person.identifier.ciencia-id | E71D-3237-849C | |
| person.identifier.ciencia-id | F51E-9BB5-EF92 | |
| person.identifier.orcid | 0000-0002-9348-0474 | |
| person.identifier.orcid | 0000-0003-1157-7510 | |
| person.identifier.rid | K-8277-2013 | |
| person.identifier.scopus-author-id | 36861366200 | |
| person.identifier.scopus-author-id | 25823103700 | |
| rcaap.cv.cienciaid | F51E-9BB5-EF92 | Ricardo Martinho | |
| rcaap.rights | openAccess | en_US |
| relation.isAuthorOfPublication | e69d7599-392c-4f8f-a96a-bf0a0d15c8b1 | |
| relation.isAuthorOfPublication | b2a74e46-f06c-4dcd-8c64-8f78f1d55440 | |
| relation.isAuthorOfPublication.latestForDiscovery | e69d7599-392c-4f8f-a96a-bf0a0d15c8b1 |
Ficheiros
Principais
1 - 1 de 1
A carregar...
- Nome:
- A simple heuristic for the identification of the case ID attribute_.pdf
- Tamanho:
- 2.33 MB
- Formato:
- Adobe Portable Document Format
Licença
1 - 1 de 1
Miniatura indisponível
- Nome:
- license.txt
- Tamanho:
- 1.33 KB
- Formato:
- Item-specific license agreed upon to submission
- Descrição:
