H.6.3.3. Pattern analysis
Meysam Roostaee; Razieh Meidanshahi
Abstract
In this study, we sought to minimize the need for redundant blood tests in diagnosing common diseases by leveraging unsupervised data mining techniques on a large-scale dataset of over one million patients' blood test results. We excluded non-numeric and subjective data to ensure precision. To identify ...
Read More
In this study, we sought to minimize the need for redundant blood tests in diagnosing common diseases by leveraging unsupervised data mining techniques on a large-scale dataset of over one million patients' blood test results. We excluded non-numeric and subjective data to ensure precision. To identify relationships between attributes, we applied a suite of unsupervised methods including preprocessing, clustering, and association rule mining. Our approach uncovered correlations that enable healthcare professionals to detect potential acute diseases early, improving patient outcomes and reducing costs. The reliability of our extracted patterns also suggest that this approach can lead to significant time and cost savings while reducing the workload for laboratory personnel. Our study highlights the importance of big data analytics and unsupervised learning techniques in increasing efficiency in healthcare centers.
F.1. General
A. Telikani; A. Shahbahrami; R. Tavoli
Abstract
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that ...
Read More
Data sanitization is a process that is used to promote the sharing of transactional databases among organizations and businesses, it alleviates concerns for individuals and organizations regarding the disclosure of sensitive patterns. It transforms the source database into a released database so that counterparts cannot discover the sensitive patterns and so data confidentiality is preserved against association rule mining method. This process strongly rely on the minimizing the impact of data sanitization on the data utility by minimizing the number of lost patterns in the form of non-sensitive patterns which are not mined from sanitized database. This study proposes a data sanitization algorithm to hide sensitive patterns in the form of frequent itemsets from the database while controls the impact of sanitization on the data utility using estimation of impact factor of each modification on non-sensitive itemsets. The proposed algorithm has been compared with Sliding Window size Algorithm (SWA) and Max-Min1 in term of execution time, data utility and data accuracy. The data accuracy is defined as the ratio of deleted items to the total support values of sensitive itemsets in the source dataset. Experimental results demonstrate that proposed algorithm outperforms SWA and Max-Min1 in terms of maximizing the data utility and data accuracy and it provides better execution time over SWA and Max-Min1 in high scalability for sensitive itemsets and transactions.