Document Type : Original/Review Paper

Authors

1 Department of Biomedical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.

2 Department of Biomedical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran

3 Department of Electrical Engineering, Mashhad Branch, Islamic Azad University, Mashhad, Iran.

Abstract

Various data analysis research has recently become necessary in to find and select relevant features without class labels using Unsupervised Feature Selection (UFS) approaches. Despite the fact that several open-source toolboxes provide feature selection techniques to reduce redundant features, data dimensionality, and computation costs, these approaches require programming knowledge, which limits their popularity and has not adequately addressed unlabeled real-world data. Automatic UFS Toolbox (Auto-UFSTool) for MATLAB, proposed in this study, is a user-friendly and fully-automatic toolbox that utilizes several UFS approaches from the most recent research. It is a collection of 25 robust UFS approaches, most of which were developed within the last five years. Therefore, a clear and systematic comparison of competing methods is feasible without requiring a single line of code. Even users without any previous programming experience may utilize the actual implementation by the Graphical User Interface (GUI). It also provides the opportunity to evaluate the feature selection results and generate graphs that facilitate the comparison of subsets of varying sizes. It is freely accessible in the MATLAB File Exchange repository and includes scripts and source code for each technique. The link to this toolbox is freely available to the general public on: bit.ly/AutoUFSTool

Keywords

Main Subjects

[1] H. Liu and H. Motoda, Feature selection for knowledge discovery and data mining. in The Springer International Series in Engineering and Computer Science, Springer New York: NY, 1998.
 
[2] G. Ritter, Robust cluster analysis and variable selection. CRC Press, 2014.
 
[3] R. J. Curts and D. E. Campbell, Pattern recognition algorithms for data mining. Chapman & Hall CRC Press, 2004.
 
[4] S. Hosseini and M. Khorashadizade, “Efficient Feature Selection Method using Binary Teaching-learning-based Optimization Algorithm,” J. AI Data Min., vol. 11, no. 1, pp. 29-37, January 2023.
 
[5] Z. Shojaee, S. A. Shahzadeh Fazeli, E. Abbasi, and F. Adibnia, “Feature Selection based on Particle Swarm Optimization and Mutual Information,” J. AI Data Min., vol. 9, no. 1, pp. 39-44, January 2021.
 
[6] C.C. Aggarwal. Data classification: algorithms and applications. Chapman & Hall CRC, 1st ed. Taylor & Francis, 2014.
 
[7] S. Solorio-Fernández, J. A. Carrasco-Ochoa, and J. F. Martínez-Trinidad, “A review of unsupervised feature selection methods,” Artif. Intell. Rev., vol. 53, no. 2, pp. 907-948, 2020.
 
[8] H. Liu, Feature engineering for machine learning and data analytics. CRC Press, 2018.
 
[9] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, et al., “Scikit-learn: machine learning in python,” J. of Mach. Learn. Res., vol. 12, no. 85, pp. 2825–2830, January 2011.
 
[10] M. Kuhn, “Building predictive models in R using the caret package,” J. Stat. Softw., vol. 28, no. 5, pp. 1-26, 2008.
 
[11] G. Roffo, “Feature selection library (MATLAB Toolbox),” Arxiv.org, August 2018. [Online] Available: https://arxiv.org/abs/1607.01327. [Accessed Sept. 10, 2022]
 
[12] I. H. Witten, E. Frank, and J. Geller, “Data mining: practical machine learning tools and techniques with Java implementations,” ACM SIGMOD Record, vol. 31, no. 1, pp. 76–77, March 2002.
 
[13] Z. Zhang, X. Liang, W. Qin, S. Yu, and Y. Xie, “matFR: a MATLAB toolbox for feature ranking,” Bioinformatics, vol. 36, no. 19, pp. 4968–4969, December 2020.
 
[14] I. Guyon, J. Weston, S. Barnhill, and V. Vapnik, “Gene Selection for cancer classification using support vector machines,” Machine Learning, vol. 46, no. 1, pp. 389–422, 2002.
[15] X. He, D. Cai, and P. Niyogi, “Laplacian score for feature selection,” Advances in neural information processing systems, vol. 18, 2005.
 
[16] Z. Zhao and H. Liu, “Spectral feature selection for supervised and unsupervised learning,” in Proceedings of the 24th international conference on Machine learning, Corvallis, 2007, pp.1151-1157.
 
[17] D. Cai, C. Zhang, and X. He, “Unsupervised feature selection for multi-cluster data,” in Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, 2010, pp. 333–342.
 
[18] Y. Yang, H. T. Shen, Z. Ma, Z. Huang, and X. Zhou, “ℓ2,1-Norm regularized discriminative feature selection for unsupervised learning,” International joint conference on artificial intelligence, IJCAI. 2011, pp. 1589–1594.
 
[19] H. Zeng and Y. M. Cheung, “Feature selection and kernel learning for local learning-based clustering,” IEEE Transactions on pattern analysis and machine intelligence, vol. 33, no. 8, pp. 1532-1547, August 2011.
 
[20] Z. Li, Y. Yang, J. Liu, X. Zhou, and H. Lu, “Unsupervised feature selection using nonnegative spectral analysis,” in Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence, AAAI, vol. 26, no. 1, 2012 pp. 1026–1032.
 
[21] M. Qian and C. Zhai, “Robust unsupervised feature selection,” in Proceedings of the twenty-third international joint conference on artificial intelligence, IJCAI, 2013, pp. 1621-1627.
 
[22] L. Du and Y. D. Shen, “Unsupervised feature selection with adaptive structure learning,” in Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining, August 2015, pp. 209-218.
 
[23] D. Han and J. Kim, “Unsupervised simultaneous orthogonal basis clustering feature selection,” in IEEE conference on computer vision and pattern recognition, CVPR, 2015, pp. 5016–5023.
 
[24] F. Nie, W. Zhu, and X. Li, “Unsupervised feature selection with structured graph optimization,” in Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence, AAAI, vol. 30, no. 1, 2016, pp. 1302-1308.
 
[25] J. Guo, Y. Quo, X. Kong, and R. He, “Unsupervised feature selection with ordinal locality,” in IEEE international conference on multimedia and expo ,ICME, 2017, pp. 1213–1218.
 
[26] G. Roffo, S. Melzi, and M. Cristani, “Infinite feature selection,” in IEEE international conference on computer vision, ICCV, 2015, pp. 4202–4210.
 
[27] J. Guo and W. Zhu, “Dependence guided unsupervised feature selection,” in Proceedings of the association for the advancement of artificial intelligence conference on artificial intelligence, AAAI, vol. 32, no. 1, 2018, pp. 2232-2239.
 
[28] D. Huang, X. Cai, and C. D. Wang, “Unsupervised feature selection with multi-subspace randomization and collaboration,” Knowledge-based systems, vol. 182, p. 104856, October 2019.
 
[29] A. Yuan, M. You, D. He, and X. Li, “Convex non-negative matrix factorization with adaptive graph for unsupervised feature selection,” IEEE Transactions on cybernetics, vol. 52, no. 6, pp. 5522-5534, 2022.
 
[30] R. Zhang, Y. Zhang, and X. Li, “Unsupervised feature selection via adaptive graph learning and constraint,” IEEE Transactions on neural networks and learning systems, vol. 33, no. 3, pp. 1355–1362, March 2022.
 
[31] Y. Liu, D. Ye, W. Li, H. Wang, and Y. Gao, “Robust neighborhood embedding for unsupervised feature selection,” Knowledge-based systems, vol. 193, p. 105462, April 2020.
 
[32] G. Roffo, S. Melzi, U. Castellani, A. Vinciarelli, and M. Cristani, “Infinite feature selection: a graph-based feature filtering approach,” IEEE Transactions on pattern analysis and machine intelligence, vol. 43, no. 12, pp. 4396–4410, 2020.
 
[33] Z. Yuan, H. Chen, X. Yang, T. Li, and K. Liu, “Fuzzy complementary entropy using hybrid-kernel function and its unsupervised attribute reduction,” Knowledge-based systems, vol. 231, pp. 107398, November 2021.
 
[34] Z. Yuan, H. Chen, P. Zhang, J. Wan, and T. Li, “A novel unsupervised approach to heterogeneous feature selection based on fuzzy mutual information,” IEEE Transactions on fuzzy systems, vol. 30, no. 9, pp. 3395 – 3409, September 2022.
 
[35] Z. Yuan, H. Chen, T. Li, Z. Yu, B. Sang, and C. Luo, “Unsupervised attribute reduction for mixed data based on fuzzy rough sets,” Information sciences (Ny), vol. 572, pp. 67–87, September 2021.
 
[36] A. Villa, A. M. Narayanan, S. Van Huffel, A. Bertrand, and C. Varon, “Utility metric for unsupervised feature selection,” Peer J computer science, vol. 7, pp. 1–26, April 2021.
 
[37] S.G. Fang, D. Huang, C.D. Wang, and Y Tang, “Joint multi-view unsupervised feature selection and graph learning” Arxiv.org, August 2023. [Online] Avilable: https://arxiv.org/abs/2204.08247. [Accessed Sept.  18, 2023]
 
[38] M. You, A. Yuan, D. He, and X. Li, “Unsupervised feature selection via neural networks and self-expression with adaptive graph constraint,” Pattern Recognition, vol. 135, pp. 109173, Mar. 2023.
 
[39] S. A. Nene, S. K. Nayar and H. Murase, “CAVE Software: Columbia Object Image Library (COIL-20),” Technical Report CUCS-005-96, February 1996.
 
[40] P.N. Tan, M. Steinbach, V. Kumar, Introduction to data mining. Pearson Addison-Wesley, 1st ed., 2014.
 
[41] Z. Zhao, L. Wang, H. Liu, and J. Ye, “On similarity preserving feature selection,” IEEE Transactions on knowledge and data engineering, vol. 25, no. 3, pp. 619-632, March 2013.
 
[42] M. Halkidi, Y. Batistakis, and M. Vazirgiannis, “On clustering validation techniques,” Journal of intelligent information systems, vol. 17, pp. 107–145, 2001.