Document Type : Original/Review Paper


1 Department of Computer Science, University of Ibadan, Ibadan, Nigeria

2 Department of Computer Science and Information Technology, Bowen University, Iwo, Nigeria


Breast cancer is the second major cause of death and accounts for 16% of all cancer deaths worldwide. Most of the methods of detecting breast cancer are very expensive and difficult to interpret such as mammography. There are also limitations such as cumulative radiation exposure, over-diagnosis, false positives and negatives in women with a dense breast which pose certain uncertainties in high-risk population. The objective of this study is Detecting Breast Cancer Through Blood Analysis Data Using Classification Algorithms. This will serve as a complement to these expensive methods. High ranking features were extracted from the dataset. The KNN, SVM and J48 algorithms were used as the training platform to classify 116 instances. Furthermore, 10-fold cross validation and holdout procedures were used coupled with changing of random seed. The result showed that KNN algorithm has the highest and best accuracy of 89.99% and 85.21% for cross validation and holdout procedure respectively. This is followed by the J48 with 84.65% and 75.65% for the two procedures respectively. SVM had 77.58% and 68.69% respectively. Although it was also discovered that Blood Glucose level is a major determinant in detecting breast cancer, it has to be combined with other attributes to make decision as a result of other health issues like diabetes. With the result obtained, women are advised to do regular check-ups including blood analysis in order to know which of the blood components need to be worked on to prevent breast cancer based on the model generated in this study.


[1] J. Tang, R. M. Rangayyan, J. Xu, I. E. Naqa, and Y. Yang, “Computer-aided detection and diagnosis of breast cancer with mammography: recent advances,” IEEE Transactions on Information Technology in Biomedicine, vol. 13, no. 2, pp. 236-251, 2009.
[2] M. F. Aslan, Y. Celik, K. Sabanci, and A. Durdu, “Breast Cancer Diagnosis by Different Machine Learning Methods using Blood Analysis Data,” International Journal of Intelligent Systems and Applications in Engineering, vol. 6, no. 4, 2018.
[3] Z. Ahmad, A. Khurshid, A. Qureshi, R. Idress, N. Asghar, and N. Kayani, “Breast carcinoma grading, estimation of tumor size, axillary lymph node status, staging, and Nottingham prognostic index scoring on mastectomy specimens,” Indian Journal of Pathology and Microbiology, vol. 52, no. 4, pp. 477, 2009.
[4] U. R. Acharya, E. Y. Ng, J. H. Tan, and S. V. Sree, “Thermography-based breast cancer detection using texture features and support vector machine,” Journal of medical systems, vol. 36, no. 3, pp. 1503-1510, 2012.
[5] K. Ganesan, U. R. Acharya, C. K. Chua, L. C. Min, K. T. Abraham, and K.H. Ng, “Computer-aided breast cancer detection using mammograms: a review,” IEEE Reviews in biomedical engineering, vol. 6, pp. 77-98, 2013.
[7] U. Raghavendra, A. Gudigar, N. T. Rao, E. J. Ciaccio, E. Y. Ng, and U. R. Acharya, “Computer-aided diagnosis for the identification of breast cancer using thermogram images: A comprehensive review,” Infrared Physics and Technology, vol. 102, 2019
[9] I. Schreer and J. Lüttges, “Breast cancer: early detection,” In Radiologic-Pathologic Correlations from Head to Toe, Germany, pp. 767-784, 2005.
[10] J. Melnikow, J. J. Fenton, E. P. Whitlock, D. L. Miglioretti, M. S Weyrich, and J. H. Thompson, “Supplemental screening for breast cancer in women with dense breasts: a systematic review for the U.S. Preventive Services Task Force,” Ann Intern Med. Vol. 164, 2016.
[11] A. B Miller, C. Wall, C. J. Baines, P. Sun, T. To, and S. A. Narod, Twenty-five-year follow-up for breast cancer incidence and mortality of the Canadian National Breast Screening Study: randomized screening trial,” BMJ, 2014.
[12] J. Crisóstomo, P. Matafome, D. Santos-Silva, A. Gomes, M. Gomes, M. Patricio, L. Letra, A. Sarmento-Ribeiro, L. Santos, R. Seica, “Hyperresistinemia and metabolic dysregulation: the close crosstalk in obese breast cancer,” Endocrine, vol. 53, no. 2, 2016.
[13] S.B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” Informatica, vol. 31, pp 249-268, 2007.
[14] E. Ahishakiye, E. O. Omulo, D. Taremwa, and I. Niyonzima, “Crime Prediction Using Decision Tree (J48) Classification Algorithm,” International Journal of Computer and Information Technology, vol. 6, no. 3, 2017.
[15] L. Rokach and O. Maimon, “Top – Down Induction of Decision Trees Classifiers–A Survey,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 35, no. 4, pp. 476-487, 2005.
[16] H. Jiawei and M. Kamber, “Data Mining: Concepts and Techniques,” Morgan Kaufman, 2011.
[17] M. Pal and P. M. Mather, “An assessment of the effectiveness of decision tree methods for land cover classification,” Remote Sensing of Environment, vol. 86, pp. 554-565, 2003.
[18] O. O. Oladimeji and O. O. Oladimeji, “Exploring Data Mining Research in West Africa: A Bibliometric Analysis,” SLIS Connecting, vol. 9, no. 2, 2020.
[19] M. Abdar, W. Ksiazek, U. R. Acharya, R. Tan, V. Makarenkov, and P. A. Plawiak, “A New Machine Learning Technique for an Accurate Diagnosis of Coronary Artery Disease,” Computer Methods and Programs in Biomedicine, vol. 179, 2019.
[20] S. Maniraj, A. Saini, S. D. Sarka, and S. Ahmed, “Credit Card Fraud Detection using Machine Learning and Data Science,” International Journal of Engineering Research and Technology, vol. 8, no. 9, 2019.
[21] M.U. Ghani, T.M. Alam, and F.H. Jaskani, “Comparison of Classification Models for Early Prediction of Breast Cancer,” In 2019 International Conference on Innovative Computing (ICIC), pp. 1-6, 2019.
[22] Y. Li and Z. Chen, “Performance Evaluation of Machine Learning Methods for Breast Cancer Prediction,” Applied and Computational Mathematics, vol. 7, no. 4, pp. 212 -216, 2018.
[23] P. Domingos, “A few useful things to know about machine learning,” Commun ACM, vol. 55, no. 10., 2012.
[24] R. Chen, N. Sun, X. Chen, M. Yang, and Q. Wu, “Supervised feature selection with a stratified feature weighting method.” IEEE Access, vol. 6, 2018.
[26] M. Patrício, J. Pereira, J. Crisóstomo, P. Matafome, M. Gomes, R. Seiça, and F. Caramelo, “Using resistin, glucose, age, and BMI to predict the presence of breast cancer” BMC cancer, vol. 18, no. 1, 2018.
[27] J. G. Santillan-Benitez, H. Mendieta-Zeron, L. M. Gomez-Olivan, J. J. Torres-Juarez, J. M. Gonzalez-Banales, L. V. Hernandez-Pena, and A. Ordonez-Quiroz, “The tetrad BMI, Leptin, Leptin/Adiponectin (L/a) ratio and CA 15-3 are reliable biomarkers of breast cancer,” J Clin Lab Anal., vol. 27, no. 1, pp. 12–20, 2013.
[28] R. Durgabai and Y. RaviBhushan, “Feature selection using ReliefF Algorithm,” International Journal of Advanced Research in Computer and Communication Engineering, vol. 3, 2014.
[30] P. Diez, “Smart Wheelchairs and Brain-Computer Interfaces,” Elsevier; 2018