H.6.3.2. Feature evaluation and selection
Sayyed Mohammad Hoseini; Majid Ebtia; Mohanna Dehgardi
Abstract
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations ...
Read More
The abundance of high dimensional datasets and the computational limitations of data analysis processes in applying to high-dimensional data have made clear the importance of developing feature selection methods. The negative impact of irrelevant variables on prediction and increasing unnecessary calculations due to the redundant attributes lead to poor results or performance of the classifiers. Feature selection is, therefore, applied to facilitate a better understanding of the datasets, reduce computational time, and enhance prediction accuracy. In this research, we develop a composite method for feature selection that combines support vector machines and principal component analysis. Then the method is implemented to the -nearest neighbor and the Naïve Bayes algorithms. The datasets utilized in this study consist of three from the UCI Machine Learning Repository, used to assess the performance of the proposed models. Additionally, a dataset gathered from the central library of Ayatollah Boroujerdi University was considered. This dataset encompasses 1,910 instances with 30 attributes, including gender, native status, entry term, faculty code, cumulative GPA, and the number of books borrowed. After applying the proposed feature selection method, an accuracy of 70% was obtained with only five features. Experimental results demonstrate that the proposed feature selection method chooses appropriate feature subset. The approach yields enhanced classification performance, as evaluated by metrics such as accuracy, -score and Matthews correlation coefficient.