Name: | Description: | Size: | Format: | |
---|---|---|---|---|
10.52 MB | Adobe PDF |
Abstract(s)
In the realm of data science, event logs serve as valuable sources of information,
capturing sequences of events or activities in various processes. However, when
dealing with unlabelled event logs, the absence of a designated Case ID column poses
a critical challenge, hindering the understanding of relationships and dependencies
among events within a case or process.
Motivated by the increasing adoption of data-driven decision-making and the
need for efficient data analysis techniques, this master’s project presents the "Case
ID Column Identification Library" project. This library aims to streamline data
preprocessing and enhance the efficiency of subsequent data analysis tasks by
automatically identifying the Case ID column in unlabelled event logs.
The project’s objective is to develop a versatile and user-friendly library that
incorporates multiple methods, including a Convolutional Neural Network (CNN)
and a parameterizable heuristic approach, to accurately identify the Case ID column.
By offering flexibility to users, they can choose individual methods or a combination
of methods based on their specific requirements, along with adjusting heuristic-based
formula coefficients and settings for fine-tuning the identification process.
This report presents a comprehensive exploration of related work, methodology,
data understanding, methods for Case ID column identification, software library
development, and experimental results. The results demonstrate the effectiveness of
the proposed methods and their implications for decision support systems.
Description
Keywords
Process Mining CNN Case ID Identification Attribute Identification