Document Type : Original/Review Paper


1 Department of Computer Engineering, University of Mazandaran, Babolsar, Iran.

2 Department of Computer Engineering, Polytechnic University of Turin, Turin, Italy.


In this study, we sought to minimize the need for redundant blood tests in diagnosing common diseases by leveraging unsupervised data mining techniques on a large-scale dataset of over one million patients' blood test results. We excluded non-numeric and subjective data to ensure precision. To identify relationships between attributes, we applied a suite of unsupervised methods including preprocessing, clustering, and association rule mining. Our approach uncovered correlations that enable healthcare professionals to detect potential acute diseases early, improving patient outcomes and reducing costs. The reliability of our extracted patterns also suggest that this approach can lead to significant time and cost savings while reducing the workload for laboratory personnel. Our study highlights the importance of big data analytics and unsupervised learning techniques in increasing efficiency in healthcare centers.


Main Subjects

[1] A. Fadlelmoula, D. Pinho, V. H. Carvalho, S. O. Catarino, and G. Minas, “Fourier Transform Infrared (FTIR) Spectroscopy to Analyse Human Blood over the Last 20 Years: A Review towards Lab-on-a-Chip Devices,” Micromachines, Vol. 13, No. 2, pp. 187, Jan. 2022, doi: 10.3390/MI13020187.
[2] M. J. Sousa, A. M. Pesqueira, C. Lemos, M. Sousa, and Á. Rocha, “Decision-Making based on Big Data Analytics for People Management in Healthcare Organizations,” Journal of medical systems, Vol. 43, No. 9, pp. 1–10, 2019, doi: 10.1007/S10916-019-1419-X/METRICS.
[3] T. Grote and P. Berens, “On the ethics of algorithmic decision-making in healthcare,” Journal of medical ethics, Vol. 46, No. 3, pp. 205–211, Mar. 2020, doi: 10.1136/MEDETHICS-2019-105586.
[4] M. J. Friedrich, “WHO’s Top Health Threats for 2019,” JAMA, Vol. 321, No. 11, pp. 1041–1041, Mar. 2019, doi: 10.1001/JAMA.2019.1934.
[5] A. C. Webster, E. V. Nagler, R. L. Morton, and P. Masson, “Chronic Kidney Disease,” Lancet, Vol. 389, No. 10075, pp. 1238–1252, Mar. 2017, doi: 10.1016/S0140-6736(16)32064-5.
[6] P. R. Pereira, D. F. Carrageta, P. F. Oliveira, A. Rodrigues, M. G. Alves, and M. P. Monteiro, “Metabolomics as a tool for the early diagnosis and prognosis of diabetic kidney disease,” Medicinal Research Reviews, Vol. 42, No. 4, pp. 1518–1544, Jul. 2022, doi: 10.1002/MED.21883.
[7] E. Paul and D. Renmans, “Performance-based financing in the heath sector in low- and middle-income countries: Is there anything whereof it may be said, see, this is new?,” The International journal of health planning and management, Vol. 33, No. 1, pp. 51–66, Jan. 2018, doi: 10.1002/HPM.2409.
[8] A. Akay and H. Hess, “Deep learning: Current and emerging applications in medicine and technology,” IEEE journal of biomedical and health informatics, Vol. 23, No. 3, pp. 906–920, May 2019, doi: 10.1109/JBHI.2019.2894713.
[9] S. Kolluri, J. Lin, R. Liu, Y. Zhang, and W. Zhang, “Machine Learning and Artificial Intelligence in Pharmaceutical Research and Development: a Review,” The AAPS Journal, Vol. 24, No. 1, pp. 1–10, Feb. 2022, doi: 10.1208/S12248-021-00644-3/FIGURES/5.
[10] F. Rabbi, S. R. Dabbagh, P. Angin, A. K. Yetisen, and S. Tasoglu, “Deep Learning-Enabled Technologies for Bioimage Analysis,” Micromachines, Vol. 13, Page 260, Vol. 13, No. 2, p. 260, Feb. 2022, doi: 10.3390/MI13020260.
[11] Z. Lv and L. Qiao, “Analysis of healthcare big data,” Future Generation Computer Systems, Vol. 109, pp. 103–110, Aug. 2020, doi: 10.1016/J.FUTURE.2020.03.039.
[12] M. S. Islam, M. M. Hasan, X. Wang, H. D. Germack, and M. Noor-E-alam, “A Systematic Review on Healthcare Analytics: Application and Theoretical Perspective of Data Mining,” Healthcare, Vol. 6, No. 2, p. 54, 2018, doi: 10.3390/HEALTHCARE6020054.
[13] I. Kavakiotis, O. Tsave, A. Salifoglou, N. Maglaveras, I. Vlahavas, and I. Chouvarda, “Machine Learning and Data Mining Methods in Diabetes Research,” Computational and structural biotechnology journal, Vol. 15, pp. 104–116, Jan. 2017, doi: 10.1016/J.CSBJ.2016.12.005.
[14] R. Rastogi and M. Bansal, “Diabetes prediction model using data mining techniques,” Measurement: Sensors, Vol. 25, p. 100605, Feb. 2023, doi: 10.1016/J.MEASEN.2022.100605.
[15] C. T. Wu, C. L. Lo, C. H. Tung, and H. L. Cheng, “Applying Data Mining Techniques for Predicting Prognosis in Patients with Rheumatoid Arthritis,” Healthcare, Vol. 8, No. 2, p. 85, Apr. 2020, doi: 10.3390/HEALTHCARE8020085.
[16] M. Barrios, M. Jimeno, P. Villalba, and E. Navarro, “Novel Data Mining Methodology for Healthcare Applied to a New Model to Diagnose Metabolic Syndrome without a Blood Test,” Diagnostics, Vol. 9, No. 4, p. 192, Nov. 2019, doi: 10.3390/DIAGNOSTICS9040192.
[17] A. Begum and A. Parkavi, “Prediction of thyroid Disease Using Data Mining Techniques,” 2019 5th International Conference on Advanced Computing and Communication Systems, ICACCS 2019, pp. 342–345, Mar. 2019, doi: 10.1109/ICACCS.2019.8728320.
[18] C. Thallam, A. Peruboyina, S. S. T. Raju, and N. Sampath, “Early Stage Lung Cancer Prediction Using Various Machine Learning Techniques,” Proceedings of the 4th International Conference on Electronics, Communication and Aerospace Technology, ICECA 2020, pp. 1285–1292, Nov. 2020, doi: 10.1109/ICECA49313.2020.9297576.
[19] S. M. Ayyoubzadeh et al., “A study of factors related to patients’ length of stay using data mining techniques in a general hospital in southern Iran,” Health information science and systems, Vol. 8, pp. 1–11, 2020, doi: 10.1007/S13755-020-0099-8/METRICS.
[20] K. S. Lakshmi and G. Vadivu, “A novel approach for disease comorbidity prediction using weighted association rule mining,” Journal of Ambient Intelligence and Humanized Computing, pp. 1–8, Jan. 2019, doi: 10.1007/S12652-019-01217-1/METRICS.
[21] W. Luo et al., “Clinical data mining reveals Gancao-Banxia as a potential herbal pair against moderate COVID‐19 by dual binding to IL-6/STAT3,” Computers in biology and medicine, Vol. 145, p. 105457, Jun. 2022, doi: 10.1016/J.COMPBIOMED.2022.105457.
[22] M. Roostaee, “Citation Worthiness Identification for Fine-Grained Citation Recommendation Systems,” Iranian Journal of Science and Technology, Transactions of Electrical Engineering, Vol. 46, No. 2, pp. 353-365, 2022, doi: 10.1007/s40998-021-00472-3.
[23] N. Almugren, N. Alrumayyan, R. Alnashwan, A. Alfutamani, I. Al-Turaiki, and O. Almugren, “The effect of vitamin B12 deficiency on blood count using data mining,” Advances in Intelligent Systems and Computing, Vol. 753, pp. 234–245, 2018, doi: 10.1007/978-3-319-78753-4_18/COVER.
[24] I. K. Park and G. S. Choi, “Rough set approach for clustering categorical data using information-theoretic dependency measure,” Information Systems, Vol. 48, pp. 289–295, Mar. 2015, doi: 10.1016/J.IS.2014.06.008.
[25] A. Holzinger, M. Dehmer, and I. Jurisica, “Knowledge Discovery and interactive Data Mining in Bioinformatics - State-of-the-Art, future challenges and research directions,” BMC Bioinformatics, Vol. 15, No. 6, pp. 1–9, May 2014, doi: 10.1186/1471-2105-15-S6-I1/FIGURES/2.
[26] W. C. Lin and C. F. Tsai, “Missing value imputation: a review and analysis of the literature (2006–2017),” Artificial Intelligence Review, Vol. 53, No. 2, pp. 1487–1509, Feb. 2020, doi: 10.1007/S10462-019-09709-4/METRICS.
[27] A. Elasra, “Multiple Imputation of Missing Data in Educational Production Functions,” Computation, Vol. 10, No. 4, p. 49, Apr. 2022, doi: 10.3390/COMPUTATION10040049/S1.
[28] S. K. Paul, J. Ling, M. Samanta, and O. Montvida, “Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies,” Journal of Healthcare Informatics Research, Vol. 6, pp. 385–400, 2022, doi: 10.1007/S41666-022-00119-W/METRICS.
[29] P. C. Austin, I. R. White, D. S. Lee, and S. van Buuren, “Missing Data in Clinical Research: A Tutorial on Multiple Imputation,” Canadian Journal of Cardiology, Vol. 37, No. 9, pp. 1322–1331, Sep. 2021, doi: 10.1016/J.CJCA.2020.11.010.
[30] T. T. D. Nguyen, L. T. T. Nguyen, Q. T. Bui, U. Yun, and B. Vo, “An efficient topological-based clustering method on spatial data in network space,” Expert Systems with Applications, Vol. 215, p. 119395, Apr. 2023, doi: 10.1016/J.ESWA.2022.119395.
[31]  M. W. Li, D. Y. Xu, J. Geng, and W. C. Hong, "A hybrid approach for forecasting ship motion using CNN–GRU–AM and GCWOA," Applied Soft Computing, 114, 108084, 2022, doi: 10.1016/j.asoc.2021.108084.
[32] P. Fränti and S. Sieranoja, “How much can k-means be improved by using better initialization and repeats?,” Pattern Recognition, Vol. 93, pp. 95–112, Sep. 2019, doi: 10.1016/J.PATCOG.2019.04.014.
[33] M. E. Celebi, H. A. Kingravi, and P. A. Vela, “A comparative study of efficient initialization methods for the k-means clustering algorithm,” Expert Systems with Applications, Vol. 40, No. 1, pp. 200–210, Jan. 2013, doi: 10.1016/J.ESWA.2012.07.021.
[34] X. Liu, L. Zheng, W. Zhang, J. Zhou, S. Cao, and S. Yu, “An Evolutive Frequent Pattern Tree-based Incremental Knowledge Discovery Algorithm,” ACM Transactions on Management Information Systems, Vol. 13, No. 3, Feb. 2022, doi: 10.1145/3495213.
[35] Y. B. Abushark, “An intelligent feature selection approach with systolic tree structures for efficient association rules in big data environment,” Computers and Electrical Engineering, Vol. 101, p. 108080, Jul. 2022, doi: 10.1016/J.COMPELECENG.2022.108080.
[36] A. Cecconi, G. De Giacomo, C. Di Ciccio, F. M. Maggi, and J. Mendling, “Measuring the interestingness of temporal logic behavioral specifications in process mining,” Information Systems, Vol. 107, p. 101920, Jul. 2022, doi: 10.1016/J.IS.2021.101920.
[37] M. R. Keyvanpour, Z. K. Zandian, and N. Mottaghi, “BRTSRDM: Bi-Criteria Regression Test Suite Reduction based on Data Mining,” Journal of AI and Data Mining, Vol. 11, No. 2, pp. 161-168, Apr. 2023, doi: 10.22044/jadm.2023.12208.2374.
[38] K. Hu, L. Qiu, S. Zhang, Z. Wang, and N. Fang, “An incremental rare association rule mining approach with a life cycle tree structure considering time-sensitive data,” Applied Intelligence, pp. 1–25, Aug. 2022, doi: 10.1007/S10489-022-03978-3/METRICS.
[39] M. Tandan, Y. Acharya, S. Pokharel, and M. Timilsina, “Discovering symptom patterns of COVID-19 patients using association rule mining,” Computers in biology and medicine, Vol. 131, p. 104249, Apr. 2021, doi: 10.1016/J.COMPBIOMED.2021.104249.
[40] J. Hong, R. Tamakloe, and D. Park, “Application of association rules mining algorithm for hazardous materials transportation crashes on expressway,” Accident Analysis & Prevention, Vol. 142, p. 105497, Jul. 2020, doi: 10.1016/J.AAP.2020.105497.
[41] F. Zahedi and M.-R. Zare-Mirakabad, “Employing data mining to explore association rules in drug addicts,” Journal of AI and Data Mining, Vol. 2, No. 2, pp. 135–139, Jul. 2014, doi: 10.22044/JADM.2014.308
[42] J. W. Lee and O. Harel, "Incomplete clustering analysis via multiple imputation," Journal of Applied Statistics, 1-18, 2022, doi: 10.1080/02664763.2022.2060952.
[43] M. Quartagno and J. R. Carpenter, "Substantive model compatible multilevel multiple imputation: A joint modeling approach," Statistics in medicine, Vol. 41, No. 25, pp. 5000-5015, 2022, doi: 10.1002/sim.9549.
[44] N. P. Jayasri and R. Aruna, “Big data analytics in health care by data mining and classification techniques,” ICT Express, Vol. 8, No. 2, pp. 250–257, Jun. 2022, doi: 10.1016/J.ICTE.2021.07.001.
[45] A. Alaiad, H. Najadat, B. Mohsen, and K. Balhaf, “Classification and Association Rule Mining Technique for Predicting Chronic Kidney Disease,” Journal of Information & Knowledge Management,  Vol. 19, No. 1, Mar. 2020, doi: 10.1142/S0219649220400158.
[46]  B. Andrianto, Y. K. Suprapto, I. Pratomo, and I. Irawati, “Clinical decision support system for typhoid fever disease using classification techniques,” Proceedings - 2019 International Seminar on Intelligent Technology and Its Application, ISITIA 2019, pp. 248–252, Aug. 2019, doi: 10.1109/ISITIA.2019.8937286.