An Evolutionary Multi-objective Discretization based on Normalized Cut

Document Type: Research/Original/Regular Article

Authors

Electrical and Computer Engineering Department, Yazd University, Yazd, Iran.

10.22044/jadm.2019.8507.1989

Abstract

Learning models and related results depend on the quality of the input data. If raw data is not properly cleaned and structured, the results are tending to be incorrect. Therefore, discretization as one of the preprocessing techniques plays an important role in learning processes. The most important challenge in the discretization process is to reduce the number of features’ values. This operation should be applied in a way that relationships between the features are maintained and accuracy of the classification algorithms would increase. In this paper, a new evolutionary multi-objective algorithm is presented. The proposed algorithm uses three objective functions to achieve high-quality discretization. The first and second objectives minimize the number of selected cut points and classification error, respectively. The third objective introduces a new criterion called the normalized cut, which uses the relationships between their features’ values to maintain the nature of the data. The performance of the proposed algorithm was tested using 20 benchmark datasets. According to the comparisons and the results of nonparametric statistical tests, the proposed algorithm has a better performance than other existing major methods.

Keywords

Main Subjects


[1] Pouramini A, Khaje Hassani S, Nasiri S (2018). Data Extraction using Content-Based Handles, Journal of AI and Data Mining, vol. 6, pp. 399-407. doi:10.22044/jadm.2017.990

[2] Ramírez-Gallego S, García S, Benítez JM, Herrera F (2018). A distributed evolutionary multivariate discretizer for big data processing on apache spark, Swarm and Evolutionary Computation, vol. 38, pp. 240-250.

[3] García-Gil D, Ramírez-Gallego S, García S, Herrera F (2018). Principal components analysis random discretization ensemble for big data, Knowledge-Based Systems, vol. 150, pp. 166-174.

[4] Ramírez-Gallego S, García S, Herrera F (2018). Online entropy-based discretization for data streaming classification, Future Generation Computer Systems, vol. 86, pp. 59-70.

[5] Ramírez-Gallego S, García S, Benítez JM, Herrera F (2016). Multivariate discretization based on evolutionary cut points selection for classification, IEEE transactions on cybernetics, vol. 46, pp. 595-608.

[6] Tahan MH, Asadi S (2018). MEMOD: a novel multivariate evolutionary multi-objective discretization, Soft Computing, vol. 22, pp. 301-323. doi:10.1007/s00500-016-2475-5

[7] Tahan MH, Asadi S (2018). EMDID: Evolutionary multi-objective discretization for imbalanced datasets, Information Sciences, vol. 432, pp. 442-461. doi: 10.1016/j.ins.2017.12.023

[8] Fayyad U, Irani K (1993). Multi-interval discretization of continuous-valued attributes for classification learning.

[9] Kurgan LA, Cios KJ (2004). CAIM discretization algorithm, IEEE transactions on Knowledge and Data Engineering, vol. 16, pp. 145-153.

[10] Tay FE, Shen L (2002). A modified chi2 algorithm for discretization, IEEE Transactions on Knowledge & Data Engineering, pp. 666-670.

[11] Sriwanna K, Boongoen T, Iam-On N (2018). Graph clustering-based discretization approach to microarray data, Knowledge and Information Systems, pp. 1-28.

[12] Kim K-j, Han I (2000). Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index, Expert systems with Applications, vol. 19, pp. 125-132.

[13] Yang H, Wang J, Shao X, Wang NS (2007). Information System Continuous Attribute Discretization Based on Binary Particle Swarm Optimization. In: Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007), 2007. IEEE, pp. 173-177.

[14] García S, López V, Luengo J, Carmona CJ, Herrera F (2012). A Preliminary Study on Selecting the Optimal Cut Points in Discretization by Evolutionary Algorithms. In: ICPRAM (1), 2012. pp. 211-216.

[15] Zamudio-Reyes R, Cruz-Ramírez N, Mezura-Montes E (2017). A Multivariate Discretization Algorithm Based on Multiobjective Optimization. In: 2017 International Conference on Computational Science and Computational Intelligence (CSCI), 2017. IEEE, pp. 375-380.

[16] Lotfi S, Karimi F (2017). A Hybrid MOEA/D-TS for Solving Multi-Objective Problems, Journal of AI and Data Mining, vol. 5, pp. 183-195. doi:10.22044/jadm.2017.886

[17] Coello CAC, Lamont GB, Van Veldhuizen DA (2007). Evolutionary algorithms for solving multi-objective problems, vol 5. Springer. doi:10.1007/978-0-387-36797-2.

[18] Ngatchou P, Zarei A, El-Sharkawi A (2005). Pareto multi objective optimization. In: Proceedings of the 13th International Conference on, Intelligent Systems Application to Power Systems, 2005. IEEE, pp. 84-91. doi:10.1109/ISAP.2005.1599245.

[19] Tahan MH, Ghasemzadeh M (2019). An evolutionary multi-objective algorithm for constructing balanced spanning tree, Journal of Soft Computing and Information Technology (JSCIT).

[20] Sriwanna K, Boongoen T, Iam-On N (2017). Graph clustering-based discretization of splitting and merging methods(GraphS and GraphM),Human-centric Computing and Information Sciences, vol. 7, p. 21.

[21] Deb K, Jain H (2014). An evolutionary many-objective optimization algorithm using reference-point-based nondominated sorting approach, part I: solving problems with box constraints, IEEE Transactions on Evolutionary Computation, vol. 18, pp. 577-601.

[22] Blake CL (1998). UCI Repository of machine learning databases, Irvine, University of California, http://www ics uci edu/~ mlearn/MLRepository html.

[23] Sheskin DJ (2003). Handbook of parametric and nonparametric statistical procedures. crc Press,

[24] García S, Fernández A, Luengo J, Herrera F (2009). A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability, Soft Computing, vol. 13, p. 959.

[25] Holm S (1979). A simple sequentially rejective multiple test procedure, Scandinavian journal of statistics, pp. 65-70.

[26] Hochberg Y (1988). A sharper Bonferroni procedure for multiple tests of significance, Biometrika, vol. 75, pp. 800-802.

[27] Hommel G (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test, Biometrika, vol. 75, pp. 383-386.

[28] Holland BS, Copenhaver MD (1987). An improved sequentially rejective Bonferroni test procedure, Biometrics, pp. 417-423.

[29] Rom DM (1990). A sequentially rejective test procedure based on a modified Bonferroni inequality, Biometrika, vol. 77, pp. 663-665.

[30] Finner H (1993). On a monotonicity problem in step-down multiple test procedures, Journal of the American Statistical Association, vol. 88, pp. 920-923.