Logo do repositório
 
A carregar...
Miniatura
Publicação

A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs

Utilize este identificador para referenciar este registo.

Orientador(es)

Resumo(s)

This study addresses the critical challenge of identifying and labelling the case ID attribute in unlabelled event logs, a fundamental task in process mining. Case IDs uniquely associate events with individual process instances, enabling accurate analysis and discovery of operational insights. Manual identification of case IDs is error-prone and labour-intensive, often hindering the scalability and reliability of process mining analyses. This paper introduces a novel heuristic method that automates case ID identification, improving efficiency and accuracy for diverse real-world datasets. The proposed heuristic leverages unique temporal patterns observed in event logs to distinguish case ID attributes from other attributes. It calculates a weighted average of temporal spans and applies customisable parameters to prioritise relevant attributes. The method was validated using 27 datasets from the Business Process Intelligence (BPI) Challenge, representing a variety of industries and event log complexities. Performance metrics, including success rates and computational efficiency, were benchmarked against existing approaches. The heuristic achieved an 85.2% top-1 success rate, and remains effective provided at least one repeating categorical attribute is present - a condition met by virtually all publicly available business and industrial logs. It consistently ranked case IDs among the top attributes even in challenging scenarios, such as cyclic processes and multi-correlated data. The method demonstrated robustness across diverse datasets, processing large event logs within seconds, highlighting its practicality for real-world applications. This research contributes an innovative and explainable approach to case ID identification that requires only raw event logs, contrasting with existing methods reliant on pre-labelled data or complex pipelines. Its simplicity, efficiency, and adaptability to various process types make it a valuable tool for advancing process mining capabilities.

Descrição

Article number: 100762.

Palavras-chave

Process mining Unlabelled event logs Case ID identification Heuristics

Contexto Educativo

Citação

Vicente, A., Grilo, C., Rijo, R., & Martinho, R. (2026). A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs. Array, 30, 100762. https://doi.org/10.1016/j.array.2026.100762

Projetos de investigação

Unidades organizacionais

Fascículo