A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs

Vicente, AndréGrilo, CarlosRijo, RuiMartinho, Ricardo2026-04-282026-04-282026-03-19Vicente, A., Grilo, C., Rijo, R., & Martinho, R. (2026). A simple heuristic for the identification of the case ID attribute in unlabelled process mining event logs. Array, 30, 100762. https://doi.org/10.1016/j.array.2026.1007622590-0056http://hdl.handle.net/10400.8/16211Article number: 100762.This study addresses the critical challenge of identifying and labelling the case ID attribute in unlabelled event logs, a fundamental task in process mining. Case IDs uniquely associate events with individual process instances, enabling accurate analysis and discovery of operational insights. Manual identification of case IDs is error-prone and labour-intensive, often hindering the scalability and reliability of process mining analyses. This paper introduces a novel heuristic method that automates case ID identification, improving efficiency and accuracy for diverse real-world datasets. The proposed heuristic leverages unique temporal patterns observed in event logs to distinguish case ID attributes from other attributes. It calculates a weighted average of temporal spans and applies customisable parameters to prioritise relevant attributes. The method was validated using 27 datasets from the Business Process Intelligence (BPI) Challenge, representing a variety of industries and event log complexities. Performance metrics, including success rates and computational efficiency, were benchmarked against existing approaches. The heuristic achieved an 85.2% top-1 success rate, and remains effective provided at least one repeating categorical attribute is present - a condition met by virtually all publicly available business and industrial logs. It consistently ranked case IDs among the top attributes even in challenging scenarios, such as cyclic processes and multi-correlated data. The method demonstrated robustness across diverse datasets, processing large event logs within seconds, highlighting its practicality for real-world applications. This research contributes an innovative and explainable approach to case ID identification that requires only raw event logs, contrasting with existing methods reliant on pre-labelled data or complex pipelines. Its simplicity, efficiency, and adaptability to various process types make it a valuable tool for advancing process mining capabilities.engProcess miningUnlabelled event logsCase ID identificationHeuristicsA simple heuristic for the identification of the case ID attribute in unlabelled process mining event logsresearch article2026-04-23cv-prod-502147510.1016/j.array.2026.100762