H.3. Artificial Intelligence
Vahideh Monemizadeh; Kourosh Kiani
Abstract
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based ...
Read More
Anomaly detection is becoming increasingly crucial across various fields, including cybersecurity, financial risk management, and health monitoring. However, it faces significant challenges when dealing with large-scale, high-dimensional, and unlabeled datasets. This study focuses on decision tree-based methods for anomaly detection due to their scalability, interpretability, and effectiveness in managing high-dimensional data. Although Isolation Forest (iForest) and its extended variant, Extended Isolation Forest (EIF), are widely used, they exhibit limitations in identifying anomalies, particularly in handling normal data distributions and preventing the formation of ghost clusters. The Rotated Isolation Forest (RIF) was developed to address these challenges, enhancing the model's ability to discern true anomalies from normal variations by employing randomized rotations in feature space. Building on this approach, we proposed the Discrete Rotated Isolation Forest (DRIF) model, which integrates an Autoencoder for dimensionality reduction. Using a discrete probability distribution and an Autoencoder enhance computational efficiency. Experimental evaluations on synthetic and real-world datasets demonstrate that proposed model outperforms iForest, EIF, and RIF. And also achieving higher Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) scores and significantly faster execution times. These findings establish the proposed model as a robust, scalable, and efficient approach for unsupervised anomaly detection in high-dimensional datasets.
H.6.3.2. Feature evaluation and selection
Farhad Abedinzadeh Torghabeh; Yeganeh Modaresnia; Seyyed Abed Hosseini
Abstract
Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data ...
Read More
Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data dimensionality, and computation costs, these approaches require programming knowledge, which limits their popularity and has not adequately addressed unlabeled real-world data. Automatic UFS Toolbox (Auto-UFSTool) for MATLAB, proposed in this study, is a user-friendly and fully-automatic toolbox that utilizes several UFS approaches from the most recent research. It is a collection of 25 robust UFS approaches, most of which were developed within the last five years. Therefore, a clear and systematic comparison of competing methods is feasible without requiring a single line of code. Even users without any previous programming experience may utilize the actual implementation by the Graphical User Interface (GUI). It also provides the opportunity to evaluate the feature selection results and generate graphs that facilitate the comparison of subsets of varying sizes. It is freely accessible in the MATLAB File Exchange repository and includes scripts and source code for each technique. The link to this toolbox is freely available to the general public on: bit.ly/AutoUFSTool
H.6.3.3. Pattern analysis
Meysam Roostaee; Razieh Meidanshahi
Abstract
In this study, we sought to minimize the need for redundant blood tests in diagnosing common diseases by leveraging unsupervised data mining techniques on a large-scale dataset of over one million patients' blood test results. We excluded non-numeric and subjective data to ensure precision. To identify ...
Read More
In this study, we sought to minimize the need for redundant blood tests in diagnosing common diseases by leveraging unsupervised data mining techniques on a large-scale dataset of over one million patients' blood test results. We excluded non-numeric and subjective data to ensure precision. To identify relationships between attributes, we applied a suite of unsupervised methods including preprocessing, clustering, and association rule mining. Our approach uncovered correlations that enable healthcare professionals to detect potential acute diseases early, improving patient outcomes and reducing costs. The reliability of our extracted patterns also suggest that this approach can lead to significant time and cost savings while reducing the workload for laboratory personnel. Our study highlights the importance of big data analytics and unsupervised learning techniques in increasing efficiency in healthcare centers.