Document Type : Original/Review Paper

Authors

Faculty of Computer Engineering, Shahrood University of Technology, Shahrood, Iran.

Abstract

Finding an effective way to combine the base learners is an essential part of constructing a heterogeneous ensemble of classifiers. In this paper, we propose a framework for heterogeneous ensembles, which investigates using an artificial neural network to learn a nonlinear combination of the base classifiers. In the proposed framework, a set of heterogeneous classifiers are stacked to produce the first-level outputs. Then these outputs are augmented using several combination functions to construct the inputs of the second-level classifier. We conduct a set of extensive experiments on 121 datasets and compare the proposed method with other established and state-of-the-art heterogeneous methods. The results demonstrate that the proposed scheme outperforms many heterogeneous ensembles, and is superior compared to singly tuned classifiers. The proposed method is also compared to several homogeneous ensembles and performs notably better. Our findings suggest that the improvements are even more significant on larger datasets.

Keywords

[1] W. J. Tastle, Data mining applications using artificial adaptive systems, Springer Science & Business Media,. 2013.
[2] J. Large, J. Lines, and A. Bagnall, “A probabilistic classifier ensemble weighting scheme based on cross-validated accuracy estimates,” Data Min. Knowl. Discov., vol. 33, no. 6, pp. 1674–1709, 2019.
[3] J. Kittler, M. Hatef, R. P. W. Duin, and J. Matas, “On combining classifiers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 3, pp. 226–239, Mar. 1998.
[4] L. I. Kuncheva and J. J. Rodríguez, “A weighted voting framework for classifiers ensembles,” Knowl. Inf. Syst., vol. 38, no. 2, pp. 259–275, 2014.
[5] L. I. Kuncheva, “Switching between selection and fusion in combining classifiers: An experiment,” IEEE Trans. Syst. Man, Cybern. Part B Cybern., vol. 32, no. 2, pp. 146–156, 2002.
[6] H. R. Kadkhodaei, AME Moghadam, and M. Dehghan, “HBoost: A heterogeneous ensemble classifier based on the Boosting method and entropy measurement”, Expert Systems with Applications vol. 157:113482, 2020.
[7] E. Soltanmohammadi, M. Naraghi-Pour, and M. van der Schaar, “Context-based unsupervised ensemble learning and feature ranking”, Machine Learning vol. 105, pp. 459–485, 2016.
[8] F. Pinagé, dos EM. Santos, and J. Gama, “A drift detection method based on dynamic classifier selection”, Data Mining and Knowledge Discovery vol. 34, pp. 50–74, 2020.
[9] T. T. Nguyen, N. Van Pham, M. T. Dang, A. V. Luong, J. McCall, and A. W. C. Liew, “Multi-layer heterogeneous ensemble with classifier and feature selection,” in Proceedings of the 2020 Genetic and Evolutionary Computation Conference, Jun. 2020, vol. 100, pp. 725–733.
[10] K. Zhao, T. Matsukawa, E. Suzuki, “Experimental validation for N-ary error correcting output codes for ensemble learning of deep neural networks”, Journal of Intelligent Information Systems vol. 52, pp.367–392, 2019.
[11] D. Jimenez, “Dynamically weighted ensemble neural networks for classification,” in 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227), 1998.
[12] J. Z. Kolter and M. A. Maloof, “Dynamic weighted majority: An ensemble method for drifting concepts,” J. Mach. Learn. Res., vol. 8, pp. 2755–2790, 2007.
[13] Y. Zhang, G. Cao, B. Wang, and X. Li, “A novel ensemble method for k-nearest neighbor,” Pattern Recognit., vol. 85, pp. 13–25, 2019.
[14] D. H. Wolpert, “Stacked generalization,” Neural Networks, vol. 5, no. 2, pp. 241–259, Jan. 1992.
[15] J. Kittler and F. M. Alkoot, “Sum versus vote fusion in multiple classifier systems,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 1, pp. 110–115, 2003.
[16] P. K. Chan and S. J. Stolfo, “Learning Arbiter and Combiner Trees from Partitioned Data for Scaling Machine Learning,” in KDD95, 1995, pp. 39–44. [Online]. Available: https://www.aaai.org/Papers/KDD/1995/KDD95-047.pdf
[17] P. Chan and S. Stolfo, “Toward parallel and distributed learning by meta-learning,” in Working notes of the AAAAI-93 workshop on Knowledge Discovery in Databases, 1993, pp. 227–240. [Online]. Available: https://www.aaai.org/Papers/Workshops/1993/WS-93-02/WS93-02-020.pdf
[18] T. Furlanello, Z. C. Lipton, M. Tschannen, L. Itti, and A. Anandkumar, “Born Again Neural Networks,” in 35th International Conference on Machine Learning, ICML 2018, May 2018, vol. 4, pp. 2615–2624. [Online]. Available: http://arxiv.org/abs/1805.04770
[19]         A. Dutt, D. Pellerin, and G. Quénot, “Coupled ensembles of neural networks,” Neurocomputing, vol. 396, pp. 346–357, Jul. 2020.
[20]         A. Petrakova, M. Affenzeller, and G. Merkurjeva, “Heterogeneous versus Homogeneous Machine Learning Ensembles,” Inf. Technol. Manag. Sci., vol. 18, no. 1, 2016.
[21] M. P. Sesmero, A. I. Ledezma, and A. Sanchis, “Generating ensembles of heterogeneous classifiers using Stacked Generalization,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 5, no. 1, pp. 21–34, 2015.
[22] K. M. Ting and I. H. Witten, “Issues in Stacked GeneralizationS,” J. Artif. Intell. Res., vol. 10, pp. 271–289, 1999.
[23] T. K. Ho, “The Random Subspace Method for Constructing Decision Forests,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 8, pp. 832–844, 1998.
[24] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, vol. 13-17-Augu, pp. 785–794.
[25] C. E. Brodley and T. Lane, “Creating and exploiting coverage and diversity,” Proc. AAAI Work. Integr. Mult. Learn. Model., pp. 8–14, 1996, [Online]. Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.51.2526&rep=rep1&type=pdf
[26] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, Aug. 1996, doi: 10.1007/BF00058655.
[27] H. Drucker, “Improving regressors using boosting techniques,” in 14th International Conference on Machine Learning, 1997, pp. 107–115. [Online]. Available: http://www.researchgate.net/publication/2424244_Improving_Regressors_using_Boosting_Techniques/file/3deec51ae736538cec.pdf%5Cnhttp://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.314
[28] Sh. Kashef, H. Nezamabadi-pour, “MLIFT: Enhancing Multi-label Classifier with Ensemble Feature Selection”, Journal of AI and Data Mining, vol. 7, no. 3, pp. 355-365, 2019.
[29] L. I. Kuncheva and E. Alpaydin, Combining Pattern Classifiers: Methods and Algorithms, vol. 18, no. 3. 2007.
[30] S. B. Oh, “On the relationship between majority vote accuracy and dependency in multiple classifier systems,” Pattern Recognit. Lett., vol. 24, no. 1–3, pp. 359–363, 2003.
[31] R. M. O. Cruz, R. Sabourin, and G. D. C. Cavalcanti, “On meta-learning for dynamic ensemble selection,” in Proceedings - International Conference on Pattern Recognition, 2014, pp. 1230–1235.
[32] T. Zhang and G. Chi, “A heterogeneous ensemble credit scoring model based on adaptive classifier selection: An application on imbalanced data,” Int. J. Financ. Econ., no. August, p. ijfe.2019, Aug. 2020.
[33]         M. Smȩtek and B. Trawiński, “Selection of heterogeneous fuzzy model ensembles using self-adaptive genetic algorithms,” New Gener. Comput., vol. 29, no. 3, pp. 309–327, 2011.
[34]         E. Menahem, L. Rokach, and Y. Elovici, “Combining one-class classifiers via meta learning,” in International Conference on Information and Knowledge Management, Proceedings, 2013, pp. 2435–2440.
[35]         G. Tsoumakas, L. Angelis, and I. Vlahavas, “Selective fusion of heterogeneous classifiers,” Intell. Data Anal., vol. 9, no. 6, pp. 511–525, 2005.
[36] S. Džeroski and B. Ženko, “Is combining classifiers with stacking better than selecting the best one?,” Mach. Learn., vol. 54, no. 3, pp. 255–273, 2004.
[37] T. T. Nguyen, A. V. Luong, M. T. Dang, A. W. C.Liew, and J. McCall, “Ensemble selection based on classifier prediction confidence”, Pattern Recognition, vol. 100, 107104, 2020.
[38] R. Caruana, A. Munson and A. Niculescu-Mizil, "Getting the Most Out of Ensemble Selection," Sixth International Conference on Data Mining (ICDM'06), Hong Kong, China, 2006, pp. 828-833.
[39] A. M. Webb et al., “To Ensemble or Not Ensemble: When does End-To-End Training Fail?,” in In Computer Vision and Pattern Recognition (CVPR), Feb. 2019, pp. 1–21.
[40] L. Rokach, “Ensemble Methods for Classifiers,” in Data Mining and Knowledge Discovery Handbook, no. August, New York: Springer-Verlag, 2015, pp. 957–980.
[41] Y. Baghoussi and J. Mendes-Moreira, “Instance-Based Stacked Generalization for Transfer Learning,” in Intelligent Data Engineering and Automated Learning, 2018, pp. 753–760.
[42] A. K. Seewald and J. Fürnkranz, “An evaluation of grading classifiers,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 2189, pp. 115–124, 2001.
[43] A. K. Seewald. and J. Fürnkranz, “Grading classifiers,” Austrian Research Institute for Artificial Intelligence, Vienna. Tech. Rep. OEFAI-TR-2001-01, 2001.
[44] A. K. Seewald, “Towards a Theoretical Framework for Ensemble Classification (extended version),” Austrian Research Institute for Artificial Intelligence, Vienna, Tech. Rep. TR-2003-08, 2003.
[45] D. V. Sridhar, R. C. Seagrave, and E. B. Bartlett, “Process Modeling Using Stacked Neural Networks,” AIChE J., vol. 42, no. 9, pp. 2529–2539, 1996.
[46] A. Ledezma, R. Aler, A. Sanchis, and D. Borrajo, “GA-stacking: Evolutionary stacked generalization,” Intell. Data Anal., vol. 14, no. 1, pp. 89–119, 2010.
[47] P. Shunmugapriya and S. Kanmani, “Optimization of stacking ensemble configurations through Artificial Bee Colony algorithm,” Swarm Evol. Comput., vol. 12, pp. 24–32, 2013.
[48] N. Rooney, D. Patterson, and C. Nugent, “Non-strict heterogeneous Stacking,” Pattern Recognit. Lett., vol. 28, no. 9, pp. 1050–1061, 2007.
[49] Y. Xia, C. Liu, B. Da, and F. Xie, A novel heterogeneous ensemble credit scoring model based on bstacking approach, vol. 93. Elsevier Ltd, 2018.
[50] M. Massaoudi, S. S. Refaat, I. Chihi, M. Trabelsi, F. S. Oueslati, and H. Abu-Rub, “A novel stacked generalization ensemble-based hybrid LGBM-XGB-MLP model for Short-Term Load Forecasting,” Energy, vol. 214, p. 118874, 2021.
[51] J. Yan and S. Han, “Classifying Imbalanced Data Sets by a Novel RE-Sample and Cost-Sensitive Stacked Generalization Method,” Math. Probl. Eng., vol. 2018, 2018.
[52] S. Rajaraman et al., “A novel stacked generalization of models for improved TB detection in chest radiographs,” in Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society, EMBS, 2018, vol. 2018-July, pp. 718–721.
[53] Z. Eivazpour and M. R. Keyvanpour, “CSSG: A cost-sensitive stacked generalization approach for software defect prediction,” Softw. Test. Verif. Reliab., vol. 31, no. 5, 2021.
[54] K. Akyol, “Stacking ensemble based deep neural networks modeling for effective epileptic seizure detection,” Expert Syst. Appl., vol. 148.
[55] A. Das, S. Roy, U. Bhattacharya, and S. K. Parui, “Document Image Classification with Intra-Domain Transfer Learning and Stacked Generalization of Deep Convolutional Neural Networks,” Proc. - Int. Conf. Pattern Recognit., vol. 2018-Augus, pp. 3180–3185, 2018.
[56] J. C. Platt, “Probabilistic Outputs for Support Vector Machines and Comparisons to Regularized Likelihood Methods,” Adv. large margin Classif., vol. 10, no. 3, pp. 61–74, 1999.
[57] P. Domingos and F. Provost, “Tree Induction for Probability-Based Ranking,” Mach. Learn., vol. 52, no. 3, pp. 199–215, 2003.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
[58] R. Caruana, A. Niculescu-Mizil, G. Crew, and A. Ksikes, “Ensemble selection from libraries of models,” in Twenty-first international conference on Machine learning - ICML ’04, 2004, no. 1996, p. 18.
[59] A. K. Seewald and J. Fuernkranz, “An evaluation of grading classifiers,” in International Conference on Advances in Intelligent Data Analysis, Proceedings, 2001, pp. 115-124.
[60] L. Breiman, “Bagging predictors,” Mach. Learn., vol. 24, no. 2, pp. 123–140, 1996, [Online]. Available: https://www.stat.berkeley.edu/%7B~%7Dbreiman/bagging.pdf
[61] Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Comput. Syst. Sci., vol. 55, no. 1, pp. 119–139, 1997.
[62] J. Friedman, T. Hastie, and R. Tibshirani, “Additive logistic regression: a statistical view of boosting (With discussion and a rejoinder by the authors),” Ann. Stat., vol. 28, no. 2, pp. 337–407, 2000.
[63] J. J. Rodriguez, L. I. Kuncheva, and C. J. Alonso, “Rotation Forest: A new classifier ensemble method”. IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 10, pp. 1619-1630, 2006.