Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran

Shabrandi, Ali Rebwar; Rajabzadeh Ghatari, Ali; Tavakoli, Nader; Dehghan Nayeri, Mohammad; Mirzaei, Sahar

doi:10.22044/jadm.2023.13291.2458

Document Type : Original/Review Paper

Authors

¹ Department of Industrial Management, Faculty of Management and Economics, Tarbiat Modares University, Tehran, Iran.

² Department of Emergency Medicine, Trauma and Injury Research Center, Iran University of Medical Sciences, Tehran, Iran.

³ Department of Health and Environment, Iran University of Medical Sciences, Tehran, Iran.

https://doi.org/10.22044/jadm.2023.13291.2458

Abstract

To mitigate COVID-19’s overwhelming burden, a rapid and efficient early screening scheme for COVID-19 in the first-line is required. Much research has utilized laboratory tests, CT scans, and X-ray data, which are obstacles to agile and real-time screening. In this study, we propose a user-friendly and low-cost COVID-19 detection model based on self-reportable data at home. The most exhausted input features were identified and included in the demographic, symptoms, semi-clinical, and past/present disease data categories. We employed Grid search to identify the optimal combination of hyperparameter settings that yields the most accurate prediction. Next, we apply the proposed model with tuned hyperparameters to 11 classic state-of-the-art classifiers. The results show that the XGBoost classifier provides the highest accuracy of 73.3%, but statistical analysis shows that there is no significant difference between the accuracy performance of XGBoost and AdaBoost, although it proved the superiority of these two methods over other methods. Furthermore, the most important features obtained using SHapely Adaptive explanations were analyzed. “Contact with infected people,” “cough,” “muscle pain,” “fever,” “age,” “Cardiovascular commodities,” “PO2,” and “respiratory distress” are the most important variables. Among these variables, the first three have a relatively large positive impact on the target variable. Whereas, “age,” “PO2”, and “respiratory distress” are highly negatively correlated with the target variable. Finally, we built a clinically operable, visible, and easy-to-interpret decision tree model to predict COVID-19 infection.

Keywords

Main Subjects

H.3. Artificial Intelligence

References

[1] WHO. "WHO Director-General's opening remarks at the media briefing on COVID-19 - 11 March 2020,” Mar. 11, 2020. [Online]. Available: https://www.who.int/director-general/speeches/detail/who-director-general-s-opening-remarks-at-the-media-briefing-on-covid-19---11-march-2020. [Accessed: June. 1, 2023].

[2] WHO. “WHO Coronavirus (COVID-19) Dashboard. 2022,”. [Online]. Available: https://covid19.who.int/. [Accessed: 12:15pm CEST, 21 June 2023].

[3] Hafezi, F. and M. Khodabakhsh, “Coronavirus Incidence Rate Estimation from Social Media Data in Iran.” Journal of AI and Data Mining, 2023. vol. 11, no. 2, pp. 315-329.

[4] Emanuel, E.J. et al., "Fair allocation of scarce medical resources in the time of Covid-19, "Mass Medical Soc., Vol. 382, no. 21, pp. 2049-2055, 2020.

[5] Fauci, Anthony S., H. Clifford Lane, and Robert R. Redfield. "Covid-19—navigating the uncharted." New England Journal of Medicine, vol. 382, no.13, pp. 1268-1269, 2020.

[6] Hong, K.H. et al., "Guidelines for laboratory diagnosis of coronavirus disease 2019 (COVID-19) in Korea, " Annals of laboratory medicine, vol. 40, no. 5, pp. 351-360, 2020.

[7] Bai, H.X. et al., "Performance of radiologists in differentiating COVID-19 from non-COVID-19 viral pneumonia at chest CT, "Radiology, vol. 296, no. 2, pp. E46-E54, 2020.

[8] Rajaraman, Sivaramakrishnan, and Sameer Antani. "Weakly Labeled Data Augmentation for Deep Learning: A Study on COVID-19 Detection in Chest X-Rays," Diagnostics, vol. 10, no. 6, p. 358, 2020.

[9] Cabitza, F. et al., "Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests," Clinical Chemistry and Laboratory Medicine (CCLM), vol. 59, no. 2, pp. 421-431, 2021.

[10] Yan, Y., L. Chang, and L., "Wang, Laboratory testing of SARS‐CoV, MERS‐CoV, and SARS‐CoV‐2 (2019‐nCoV): Current status, challenges, and countermeasures," Reviews in medical virology, vol. 30, no. 3, pp. 1052-9276, 2020.

[11] Hellewell, J. et al., "Feasibility of controlling COVID-19 outbreaks by isolation of cases and contacts," The Lancet Global Health, vol. 8, no. 4, pp. e488-e496, 2020.

[12] Lurie, N. et al., "Developing Covid-19 vaccines at pandemic speed," New England Journal of Medicine, vol. 382, no. 21, pp. 1969-1973, 2020.

[13] Alimadadi, A. et al., "Artificial intelligence and machine learning to fight COVID-19," American Physiological Society Bethesda, MD. vol. 52, no. 4, pp. 200-202, 2020.

[14] Vaishya, R. et al., "Artificial Intelligence (AI) applications for COVID-19 pandemic, "Diabetes & Metabolic Syndrome: Clinical Research & Reviews, vol. 14, no. 4, pp. 337-339, 2020.

[15] Chang, M.C. and D. Park, “How can blockchain help people in the event of pandemics such as the COVID-19?,” Journal of medical systems, vol. 44, no.5, pp. 1-2, 2020.

[16] Nasajpour, M., Pouriyeh, S., Parizi, R.M. et al. “Internet of Things for Current COVID-19 and Future Pandemics: An Exploratory Study,” J Healthc Inform Res, vol. 4, no. 1, pp. 325–364, 2020.

[17] Dargan, S., Kumar, M., Ayyagari, M.R. et al. “A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning,” Arch Computat Methods Eng, vol. 27, no. 4, pp. 1071–1092, 2020.

[18] O. Rajabi Shishvan, D. -S. Zois, and T. Soyata, “Machine Intelligence in Healthcare and Medical Cyber Physical Systems: A Survey,” in IEEE Access, vol. 6, pp. 46419-46494, 2018.

[19] Collins, G.S. and K.G. Moons, "Reporting of artificial intelligence prediction models. "The Lancet, vol. 393, no. 10181, pp. 1577-1579, 2019.

[20] Gozes, O. et al., "Rapid ai development cycle for the coronavirus (covid-19) pandemic: Initial results for automated detection & patient monitoring using deep learning ct image analysis. "arXiv preprint arXiv:2003.05037, 2020.

[21] Jin, C., Chen, W., Cao, Y. et al. “Development and evaluation of an artificial intelligence system for COVID-19 diagnosis,” Nat Commun, vol. 11, no. 5088, 2020.

[22] Punn, N.S. and S. Agarwal, "Automated diagnosis of COVID-19 with limited posteroanterior chest X-ray images using fine-tuned deep neural networks". Applied Intelligence, vol. 51, no. 5, pp. 2689-2702, 2021.

[23] Song, Y. et al., "Deep learning enables accurate diagnosis of novel coronavirus (COVID-19) with CT images". IEEE/ACM transactions on computational biology and bioinformatics, vol. 18, no. 6, pp. 2775-2780, Nov.-Dec. 2021.

[24] Wang, S., Kang, B., Ma, J. et al. “A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19),” Eur Radiol, vol. 31, pp. 6096–6104, Aug. 2021.

[25] Tostmann, A. et al., "Strong associations and moderate predictive value of early symptoms for SARS-CoV-2 test positivity among healthcare workers", the Netherlands, March 2020. Eurosurveillance, 2020., vol. 25, no. 16, p. 2000508, Apr. 2020.

[26] Punn N, Sonbhadra S, and Agarwal S. “COVID-19 Epidemic Analysis using Machine Learning and Deep Learning Algorithms,” medRxiv, Preprint posted online on June 01, 2020.

[27] Feng, C. et al., “A novel artificial intelligence-assisted triage tool to aid in the diagnosis of suspected COVID-19 pneumonia cases in fever clinics,” Ann Transl Med, vol. 9, no. 3, p. 201, 2021.

[28] Mei, X., Lee, HC., Diao, Ky. et al. “Artificial intelligence–enabled rapid diagnosis of patients with COVID-19,” Nat Med, vol. 26, no. 1, pp. 1224–1228, 2020.

[29] Shaverdian, N. et al., “Need for caution in the diagnosis of radiation pneumonitis during the covid-19 pandemic,” Advances in radiation oncology, vol. 5, no. 4, pp. 617-620, 2020.

[30] Malik, M. et al., “Determination of COVID-19 Patients using Machine Learning Algorithms,” Intelligent Automation & Soft Computing, vol. 31, no.1, 202.

[31] Jamshidi, E. et al., “Symptom prediction and mortality risk calculation for COVID-19 using machine learning,” Frontiers in artificial intelligence, vol. 4, no. 1, p. 72, 2021.

[32] Zoabi, Y., S. Deri-Rozov, and N. Shomron, “Machine learning-based prediction of COVID-19 diagnosis based on symptoms,” npj digital medicine, vol. 4, no.1, p. 3, 2021.

[33] Antoñanzas, J.M. et al., “Symptom-based Predictive Model of COVID-19 Disease in Children,” Viruses, vol. 14, no.1, p. 63, 2022.

[34] Rajput, D., W.-J. Wang, and C.-C. Chen, “Evaluation of a decided sample size in machine learning applications,” BMC bioinformatics, vol. 24, no.1, p. 48, 2023.

[35] Ioannidis, J.P., “Why most discovered true associations are inflated,” Epidemiology, vol. 19, no. 5, p. 640-648, 2008.

[36] Carp, J., “The secret lives of experiments: methods reporting in the fMRI literature” Neuroimage, vol. 63, no.1, pp. 289-300, 2012.

[37] Ingre, M., “Why small low-powered studies are worse than large high-powered studies and how to protect against “trivial” findings in research: comment on Friston (2012),” Neuroimage, vol. 81, no. 1, pp. 496-498., 2013.

[38] Guhathakurata, S. et al., “A novel approach to predict COVID-19 using support vector machine, in Data Science for COVID-19,” Data Science for COVID-19. Academic Press, 2021, ch. 18, pp. 351-364.

[39] Shamsi, A. et al., “Contribution of Iran in COVID-19 studies: a bibliometrics analysis,” Journal of Diabetes & Metabolic Disorders, vol. 19, no. 2, pp. 1845-1854, 2020.

[40] K. Rezaee, A. Badiei, and S. Meshgini, "A hybrid deep transfer learning-based approach for COVID-19 classification in chest X-ray images," 2020 27th National and 5th International Iranian Conference on Biomedical Engineering (ICBME), Tehran, Iran, 2020 pp. 234-241.

[41] Heydari, M.H. et al. "Clustering of Infected Patients by COVID-19 using Self-Organized Mapping and Extracting the Most Important Clinical Features," 2020 6th Iranian Conference on Signal Processing and Intelligent Systems (ICSPIS), Mashhad, Iran, 2020, pp. 1-6.

[42] Sobhani, S. et al., “Association between clinical characteristics and laboratory findings with outcome of hospitalized COVID-19 patients: a report from Northeast Iran,” Interdisciplinary perspectives on infectious diseases, 2021, 2021.

[43] Guhathakurata, S. et al., “A new approach to predict COVID-19 using artificial neural networks,” in Cyber-physical systems, Elsevier, 2022, Ch. 8, pp. 139-160.

[44] YALÇIN, N. and S. ÜNALDI, “Symptom-based COVID-19 Prediction using Machine Learning and Deep Learning Algorithms,” Journal of Emerging Computer Technologies, vol. 2, no.1, pp. 22-29, 2022.

[45] Villavicencio, C.N. et al., “COVID-19 Prediction applying supervised machine learning algorithms with comparative analysis using WEKA,” Algorithms, vol. 14, no.7, p. 201, 2021.

[46] Wirth, R. and J. Hipp. “CRISP-DM: “Towards a standard process model for data mining,” Proceedings of the 4th international conference on the practical applications of knowledge discovery and data mining,” vol. 1, Manchester, Apr. 2000, pp. 29-39.

[47] Nayak, J., Naik, B., Dinesh, P. et al. “Intelligent system for COVID-19 prognosis: a state-of-the-art survey,” Appl Intell, vol. 51, no.5, pp. 2908–2938, 2021.

[48] Bartz-Beielstein, T. and M. Zaefferer, “Hyperparameter Tuning Approaches, in Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide,” Springer Nature Singapore, Singapore, ch. 4, pp. 71-119, 2023

[49] Baptista, M.L., K. Goebel, and E.M. Henriques, “Relation between prognostics predictor evaluation metrics and local interpretability SHAP values,” Artificial Intelligence, vol. 306, no.1, pp. 103667, 2022.

Fast COVID-19 Infection Prediction with In-House Data Using Machine Learning Classification Algorithms: A Case Study of Iran

References

References

Volume 11, Issue 4October 2023Pages 573-585

Volume 11, Issue 4
October 2023
Pages 573-585