| Nome: | Descrição: | Tamanho: | Formato: | |
|---|---|---|---|---|
| 2.33 MB | Adobe PDF |
Orientador(es)
Resumo(s)
This study addresses the critical challenge of identifying and labelling the case ID attribute in unlabelled event logs, a fundamental task in process mining. Case IDs uniquely associate events with individual process instances, enabling accurate analysis and discovery of operational insights. Manual identification of case IDs is error-prone and labour-intensive, often hindering the scalability and reliability of process mining analyses. This paper introduces a novel heuristic method that automates case ID identification, improving efficiency and accuracy for diverse real-world datasets. The proposed heuristic leverages unique temporal patterns observed in event logs to distinguish case ID attributes from other attributes. It calculates a weighted average of temporal spans and applies customisable parameters to prioritise relevant attributes. The method was validated using 27 datasets from the Business Process Intelligence (BPI) Challenge, representing a variety of industries and event log complexities. Performance metrics, including success rates and computational efficiency, were benchmarked against existing approaches. The heuristic achieved an 85.2% top-1 success rate, and remains effective provided at least one repeating categorical attribute is present - a condition met by virtually all publicly available business and industrial logs. It consistently ranked case IDs among the top attributes even in challenging scenarios, such as cyclic processes and multi-correlated data. The method demonstrated robustness across diverse datasets, processing large event logs within seconds, highlighting its practicality for real-world applications. This research contributes an innovative and explainable approach to case ID identification that requires only raw event logs, contrasting with existing methods reliant on pre-labelled data or complex pipelines. Its simplicity, efficiency, and adaptability to various process types make it a valuable tool for advancing process mining capabilities.
Descrição
Article number: 100762.
Palavras-chave
Process mining Unlabelled event logs Case ID identification Heuristics
Contexto Educativo
Citação
Vicente, A., Grilo, C., Rijo, R., & Martinho, R. (2026). A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs. Array, 30, 100762. https://doi.org/10.1016/j.array.2026.100762
Editora
Elsevier
