TY - JOUR ID - 1058 TI - Extracting Predictor Variables to Construct Breast Cancer Survivability Model with Class Imbalance Problem JO - Journal of AI and Data Mining JA - JADM LA - en SN - 2322-5211 AU - Miri Rostami, S. AU - Ahmadzadeh, M. AD - Faculty of computer and IT Engineering, Shiraz University of Technology, Shiraz, Iran. Y1 - 2018 PY - 2018 VL - 6 IS - 2 SP - 263 EP - 276 KW - Breast Cancer KW - survival KW - Class Imbalance Problem KW - oversampling technique KW - Feature Selection DO - 10.22044/jadm.2017.5061.1609 N2 - Application of data mining methods as a decision support system has a great benefit to predict survival of new patients. It also has a great potential for health researchers to investigate the relationship between risk factors and cancer survival. But due to the imbalanced nature of datasets associated with breast cancer survival, the accuracy of survival prognosis models is a challenging issue for researchers. This study aims to develop a predictive model for 5-year survivability of breast cancer patients and discover relationships between certain predictive variables and survival. The dataset was obtained from SEER database. First, the effectiveness of two synthetic oversampling methods Borderline SMOTE and Density based Synthetic Oversampling method (DSO) is investigated to solve the class imbalance problem. Then a combination of particle swarm optimization (PSO) and Correlation-based feature selection (CFS) is used to identify most important predictive variables. Finally, in order to build a predictive model three classifiers decision tree (C4.5), Bayesian Network, and Logistic Regression are applied to the cleaned dataset. Some assessment metrics such as accuracy, sensitivity, specificity, and G-mean are used to evaluate the performance of the proposed hybrid approach. Also, the area under ROC curve (AUC) is used to evaluate performance of feature selection method. Results show that among all combinations, DSO + PSO_CFS + C4.5 presents the best efficiency in criteria of accuracy, sensitivity, G-mean and AUC with values of 94.33%, 0.930, 0.939 and 0.939, respectively. UR - https://jad.shahroodut.ac.ir/article_1058.html L1 - https://jad.shahroodut.ac.ir/article_1058_2647c65fe9ab0e31072d03a6fb22fdc2.pdf ER -